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METHOD AND DEVICE FOR ORGAMIZING USER PROVIDED INFORMATION 

WITH META-INFORMATION 

The present invention relates to a method and a device for organizing user provided information 
with meta-information. In particular, the user provided information is user provided audio 
information. The meta-infoimation is received from context information being available in 
conjunction with the user provided information and related logically thereto. The present 
invention is especially useful in the field of mobile terminal devices and mobile network devices 
which are more and more subjected for handling huge amounts data of different types and kinds 
which are related to some extent. 

The spread of mobile electronic devices within the population of industrial states is increasing in 
a more and more rapid way and in parallel the acceptance for usage of those mobile electronic 
devices increases dramatically due to the advantages which are provided by those mobile 
electronic devices in comparison to classical paper-based utiUties. The capability of mobile 
electronic devices for coUecting, acquiring, recording, storing, providing and processing 
electronic information grows in consequence to the increasing processing and storing c^ability 
such that the mobile electronic devices will gain more and more abiUty to be employed as 
universal information handling systems including the "traditional" forms of usage such as 
electronic calendar, electronic personal address book and electronic personal telephone directory 
and future forms of usage comprising digitized (electronic) information of any kind and/or of any 
format. 

Today, the acquiring of electronic information and digitized available multimedia data is 
primarily based on retrieving electronic inforaiation from information providing and serving 
systems, respectively. In the future, the increasing capabilities of mobile electronic devices will 
result in abilities to produce "own" electronic information and digital multimedia data, 
respectively, for example by the possibiUty to digitize legacy documents and to store the digitized 
version, by components of the mobile electronic devices allowing to acquire electronic 
information like pictures/images, movies, digital audio information and to store those data in 
electronic (digital) form. 

As known from today's electronic information systems an increasing potential for accessing and 
retrieving information involves the problem of informational overload which can be experienced 
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in conventional office environment involving the usage of e-mail, electronic journals, faxes, 
memos, messaging services, web-based information services etc deUvering a huge amount of 
simultaneous information. The aforementioned increasing capabiUty to acquire additionally 
electronic based information will intensify the problem of the informational overload. 

The possibiUty to access, manage, retrieve etc electronic information migrates to an increasing 
extent from classical desktop systems (e.g. personal computers) to mobile electronic devices like 
electronic organizers, personal digital assistants (PDA), mobile communication enabled 
terminals and the like such that the collecting of personal multimedia information relating to a 
user and events in the user's Ufe become a predominant feature. The personal multimedia 
information may be understood to comprise information acquired by audio recording, voice 
recording, taking pictures, taking videos, retrieving electronic information from any information 
serving service (document/information server), receiving electronic information via any 
transmitting service, transmitting bearer, etc. 

Consequently, the possibihty and opportunity to collect huge amount of data is only valuable 
provided the collected information is employable in an effective, meaningful, reliable and user- 
friendly way especially meeting the demands of a user in a cost-efficient way. The retrieval of 
certain information is primarily based and dependent on a reliable and efficient organization of 
the stored information being provided with indexing and/or labeling information which is used 
for identifying associated information. An effective, meaningful, reUable and user-friendly 
retrieval will be the only way to cope with the informational overload since the huge amount of 
available information will otherwise result in a confusing collection of pointless information. 

An object of the present mvention addresses a method for managing user provided information 
with meta-information. The meta-information is exfracted from information provided and 
available in the context of the user provided information. The meta-information is employed for 
retrieval of user provided information. The management enables to organize user provided 
information for establishing a retrieval system anploying the associated meta-informatiorL 

A fiorther object of the present invention is to provide a method for managing user provided 
audio information with meta-information, wherein the user provided audio information 
represents a subset of the user provided information. The meta-information is provided 
simultaneously to the user provided audio information. 



The objects of the invention are achieved with a method for organizing user provided 
information, a corresponding device adapted to perform the method, computer programs and 
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software tools which are disclosed in the independent clainais. Preferred embodiments of the 
invention are disclosed in the dependent claims. 

According to a fkst aspect of the invention, a metiiod for organizing user provided information 

5 with meta-information is provided. The organized user provided information in conjunction with 
the meta-information allows to set up an information history functionality which allows to 
retrieve purposefully user provided information which is stored therein. The user provided 
information is available in a mobile terminal device and is obtained for being processed. The user 
provided information is to be understood as data which has been generated, received, or which 

10 occurs in consequence to any user operation being performed against the mobile terminal device. 
The user provided information may be generated by employing user input means or may be 
received or obtained via any data communication interface provided by the mobile terminal 
device. It shall be further understood, that the user input means may be an arrangement of keys, a 
keyboard, a keypad or any other manually operable inputting component (joystick, pen etc), but 

15 also enhanced input means such as an audio sensor (microphone), a digital camera capable to 
take photos and/or image sequences (videos) and further sensor input. The data communication 
interface may be any interface able to receive, transmit or conununicate electronic data of any 
type with a counterpart device, in particular transaction information and/or conamunication 
information are comprised. Additionally, context information is obtained in parallel. The context 

20 information is associated with the user provided information. It shall be imderstood, that context 
information e.g. includes information about a device application and/or device function currently 
active and to which the user provided information is originally related, position/location 
information, time information and any further information being available in parallel to the user 
provided information and being related (chronologically and/or logically) to the user provided 

25 information. 

The user provided information and the associated context information is combined in accordance 
with logical and/or chronological relationship. The combimng of the user provided information 
and the associated context information is performed by obtaining individual information from the 
30 context information to be used as meta-information. The user provided information as well as the 
meta-information employed is stored in a history storage. 

The meta-information serves to enable previous retrieval of the user provided information in a 
purposeful manner. In case of retrieval, a retrieval request includes request information. The 
35 request infonnation is compared with the meta-information and in case of a matching the 
associated user provided information is retrieved correspondingly from the history storage. The 
context information represents an ixiformation pool- of individual information which are usable as 
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meta-mformation for the user provided information. In particular, this kind of providing meta- 
information may be employed in case the user provided information itself, e.g. multimedia 
information like video sequences, audio sequences, images eto., includes information of such a 
type which evade retrieving due to the feet that suitable retrieving information is difficult to be 
defined. 

According to an embodiment of the invention, the meta-information is extracted firom the 
associated context information and the extracted meta-information is assigned to the user 
provided information to be stored in the history storage. The meta-infoimation comprises 
individual information parts being included in the context information, wherein the individual 
information parts are extracted in accordance with pre-defined and/or user-defined regulation(s) 
which may vary depending on the kind of user provided information and depending of the 
composition of the context information. 

According to an embodiment of the invention, the user provided information is received fi-om an 
input means of the mobile tenninal device as user input information. Further the user provided 
information is received firom any data conmumication interface of the mobile terminal device as 
communication information and/or transaction information. 

According to an embodiment of the mvention, labeling information and/or indexing information 
is extracted fiom the context information. The labeling information and/or indexing information 
is assigned to the user provided information and stored in combination therewith in order to 
enable a retrieval of the stored user provided information by the means of the assigned labeling 
infonnation and/or indexing information used for specifying tiie content of the stored user 
provided information. 

Analogously, the labeling information and/or the indexmg information is employed for being 
compared with request infonnation in order to enable a purposeful retrieving, filtering and/or 
sorting of user provided information provided by the user history functionality. 

According to an anbodiment of the invention, referencing information or addressing information 
is generated, respectively. The referencing infonnation is generated for at least a part of the user 
provided information. The referencing information is stored in a first storage area, preferably, in 
conjunction with the meta-information which allow the retrieval, whereas the part of the user 
provided infonnation being referred by the referencing infonnation is stored in a second storage 
area. The referencing information specifies tiie storage place of that part of the usct provided 
infonnation and enable a retrieval thereof fi»m the second storage area. 
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According to a second aspect of the invention, a method for organizing user provided 
information with meta-information is provided. The meta-information allows for retrieving the 
user provided information which is available in a mobile terminal device. The context 

5 information obtained in conjimction with tiie user provided information comprises calendar 
infomiation which are provided by an electronic calendar functionality (i.e. an electronic 
calendar appUcation) implemented in the mobile terminal device. The calendar information and 
the user provided information is matched in order to obtain meta-information from the calendar 
information, wherein matching calendar information is assigned to the user provided information 

10 to be stored in the history storage. 

According to an embodiment of the invention, a first time information is obtained in accordance 
with the user provided information, i.e. a moment in time, a period of time or the like being 
related to the user provided information, e.g. moment in time at which user provided infomiation 
15 has been inputted/received. 

Further calendar information includes a plurality of calendar entries each of which is associated 
with a second time information which relates analogously to a moment in time or a period of 
time. The first time information and each of the second time information is compared and in case 
20 the first time information and one of the second time information matches, at least a part of the 
information comprised in the matching calendar entry is extracted for being employed as meta- 
information. 

Particularly, the user provided information shall be understood to be any multimedia information 
25 operable with the mobile terminal device. That is, for example a video sequence taken by a 
digital camera plugged onto and implemented in the mobile terminal device or an audio sequence 
recorded by a digital dictation apparatus also being plugged onto and implemented in the mobile 
terminal device and the like. 

30 According to an embodiment of the invention, a membership function (cf. Fuzzy Set theory) is 
assigned to the second time information and a membership grade is obtained from this 
membership fimction in accordance with the first time information. The membership grade is 
employed to define a measure which allows to estimate a reliability for retrieval. 

35 According to an embodiment of the invention, the first time information defines a first period of 
time. The user provided information is partitioned into a plurality of information parts. Each 
information part has a maximal pre-defined period of time. Each information is processed 
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separately in accordance with any of aforementioned methods according to an embodiment of the 
invention. 

According to an embodiment of the invention, the first time mfonnation defines a first period of 
5 time and the second time information defines a second period of tinae. In case the first period of 
time exceeds the second period of time in length, i.e. the first period of time overlaps the second 
period of time, the user provided information is sectioned into at least two information sections 
such that one information section of the user provided information sections relates to a period of 
time which is a sub-period of the first period of time and fits into the second period of time. The 
10 one information section which fits into the second period of time is processed in accordance with 
any of aforementioned methods according to an embodiment of the invention. 

According to an embodiment of the invention, the second period of time has certain pre-defined 
overhead periods of time extending tiie second period to previous times and to subsequent times. 

15 

According to an embodiment of the invention, the calendar information and particularly each 
calendar entry being comprised in the calendar information is semantically structured, 
respectively, i.e. the information comprised in each calendar entry is organized into several fields 
each relating to a certain information context like: "who", "where", "when", "what", etc., which 
20 allows to obtain reliable meta-information &om the calendar entries. 

According to a third aspect of the invention, a method for organizing user provided audio 
information with meta-information is provided. The user provided audio information is 
associated with additional user provided information obtained during the recording of the user 

25 provided audio information and completed with context information related to the recording of 
file provided audio information. The user provided audio information is inputted in a mobile 
terminal device by dictating of the user. The user provided audio information is received, 
recorded and stored in the mobile temiinal device. During the receiving, recording and storing of 
the user provided audio information the context information is received. The context information 

30 comprises at least user input information operable by tiie user to control tiie recording process. 
The context information is logged and meta-information is generated which is based on the 
recording context information. 

The recording context information is to be employed for retrieval of the stored user provided 
35 audio information by matching request information provided with a retiieval request with the 
recording context information for selecting the stored user provided audio information. The 
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recording context information offers additionaUy information relating to the additional user 
provided information which relates to the user provided audio information. 

According to an embodiment of the invention, the user input information which is received 
during the recording of the user provided audio information comprises control signals relating to 
the operating of the audio recording. On receiving a start signal the recording and storing of the 
user provided audio information is initiated. On receiving a keyword signal the receiving, 
recording and storing of a user provided audio keyword information is initiated. The one or more 
keywords comprised by the user provided audio keyword information relate to the user provided 
audio information being recorded and stored and allow to retrieve the stored user provided audio 
mfoimation. On receiving an attach signal additional information, in particular multimedia 
information received or provided by the mobile terminal device, is associated with the user 
provided audio information. On receiving a pause signal the recording and storing of the user 
provided audio information is paused. On receiving a continue signal the previously paused 
recording and storing of the user provided audio information is resumed. And on receiving a stop 
signal the recording and storing of the user provided audio information is stopped and the 
recording context information is generated. 

According to an embodiment of the invention, the recording of context information includes at 
least infonnation about the recorded user provided audio information comprising information 
about the user provided keyword audio information, time information about the user provided 
audio information, the user provided keyword audio information, the user input information 
being received during recording and information about the additional information which is 
associated with the user provided audio information being recorded. 

According to an embodiment of the invention, the generating of the recording context 
information comprises an encoding of the recording context information being the meta- 
information as a document being based on a markup language such as the extended markup 
language (XML). 

According to an embodiment of the invention, the meta-information relating to the recording 
context information is displayed. The displaying is enabled by assigning graphical elements to 
the information which is included in the recording context information. A dedicated graphical 
element is assigned to each kind of information entry included in the recording context 
infonnation. The dedicated graphical elements are displayed to illustrate the content of the 
recording context information and the inter-dependencies thereof in a graphical depiction 
allowing a user to have a clear overview of the recording context information. 
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According to a fourth aspect of the invention, a method for organizmg user provided audio 
information by embedding meta-information into the user provided audio information is 
provided. The user provided audio information is siq)pUed to a mobile terminal device. A set of 
code bases is defined. The set of code bases represents a pluraHty of coding symbols. Each code 
basis of the set of code bases defines at least one predefined firequency. The meta-information is 
mapped into said coding symbols provided and the mapped meta-information is included directly 
into the user provided audio information resulting in a combined user provided audio information 
having embedded the meta-information. 



According to an embodiment of the invention, a modulation signal is obtained from the mapped 
meta-information and a modulated signal is obtained by combining the user provided audio 
information and the modulation signal in the frequency domain. Further, the modulated signal is 
combined with the user provided audio information in the time domain of the signals resulting in 
15 a user provided audio information which has embedded the meta-information coded in the 
coding symbols provided by the set of code bases. 

According to an embodiment of the invention, the set of code bases is defined m a certain 
frequency sub-range being part of a total frequency range. The at least one fi«quency being pre- 

20 defined in each one of the code bases is m^ped to the fiirther frequency sub-ranges within the 
total appUcable frequency range. For example, the certain frequency sub-range may be an octave 
within which the code bases are defined! The fi^quency definitions of the code bases are mapped 
correspondingly onto fiirther octaves being within the total appUcable firequency range. The total 
frequency range may be defined by the audio processing capabiUty of the mobile terminal device 

25 and by the appUcability in reference to the user provided audio information, respectively. 

According to an embodiment of the invention, the modulated signal is obtained by multiplying 
the user provided audio information and the modulation signal in the frequency domain of the 
signals. According to an embodiment of the invention, the user provided audio information 
30 which has embedded the meta-information is obtained by adding the modulation signal and the 
user provided audio information in the time domain of the signals. 

According to a fifth aspect of the invention, a method for extracting meta-information from a 
user provided audio information which has embedded the meta-information is provided. 
35 Therefore, a correlation basis is provided. The correlation basis comprises all frequencies which 
are defined in the code bases which have been employed for embedding the meta-information 
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into the audio information. The correlation basis is appUed onto the user provided audio 
information having embedded the meta-information to extract the meta-information. 

According to an embodiment of the invention, a modulation signal is obtained in accordance 
with the correlation basis and the modulation signal which is ^pUed onto the user provided 
audio information having embedded the meta-information in the frequency domain of the signals 
by convoluting the modulation signal and the user provided audio information having embedded 
the meta-information and the correlation signal resulting in a correlation signal. Magnitude 
values are extracted from the correlation signal. Each magnitude value corresponds to one code 
basis of the set of code bases. The magnitude values are evaluated in order to reconstruct meta- 
information being embedded in the audio information. 

According to an embodiment of the invention, the evaluation further comprises obtaining of ratio 
values. A ratio value is calculated from each pair of magnitude values. The ratio values are 
normaUzed, for example by normalizing with respect to an average ratio value. The evaluation of 
the normalized ratio values allows to reconstruct the meta-information being embedded in the 
audio information. 

According to an embodiment of the invention, the correlation signal is obtained by convoluting 
the modulation signal and the audio information in the frequency domain of the signals. The 
modulation signal is obtained for the correlation basis. The correlation basis is defined within a 
certain frequency sub-range being one of a plurality of frequency sub-ranges forming the total 
^plicable frequency range. The correlation basis is mapped to each of the frequency sub-ranges 
comprised by the total sub-range analogously to the mapping of the code bases in conjunction 
with tiie coding of the meta-information. 

According to a sixth aspect of the invention, a method for retrieving user provided information 
provided by liie information history fimctionality is provided. The user provided information is 
organized for retrieval in accordance with the method for organizing user provided information 
of a mobile terminal device accordmg to an embodiment of the invention. A request is received. 
The request includes request information which instructs to retrieve certain user provided 
information identijSable by comparing request information with meta-information (labeling 
information and/or indexing information) having being assigned to the user provided information 
organized in the history storage. In case of matching, the user provided information is retrieved 
accordingly and a response, including the retrieved user provided information, is returned. 
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The request may further instruct the filter and/or sort the retrieved user input information. The 
filtering may be operated in accordance with filtering information being included in the request 
and the filtering may be operated in conjunction with the meta-mformation (labeling information 
and/or indexing information) being assigned to the user provided information. The sorting may 
be operated in accordance with sorting information being included in the request and the sorting 
may be operated analogously in conjunction with the meta-information (labeling information 
and/or indexing information). 

The matching of request retrieval information and the meta-information may require to perform 
the aforementioned method for extracting meta-information firom a user provided audio 
information which has embedded the meta-information accordmg to an embodiment of the 
invention. 

According to an embodiment of the invention, the retrieving of the user provided information 
comprises a retrieving of a referencing information or address information being associated to the 
user provided information to be retrieved and retrieving the user provided information in 
accordance with the referencing information or address information firom a second storage area. 
The referencing information may be stored in a first storage area in conjunction with the meta- 
information which allow the retrieval. 

According to an embodiment of the invention, membership grades being associated to user 
provided information being handled by the information history fimctionaUty may be used to 
derive a reliability measure of the matching user provided information stored in the history 
storage. The reHability measure may serve for filtering the matching user provided information, 
e.g. by rejecting matching user provided information being associated with a membership grade 
below a pre-defined / user-defined membership grade threshold, or may serve for sorting the 
matching user provided information in a sequence corresponding to the individual membership 
grades. 

According to a seventh aspect of the invention, a method for providing storage capacity for 
organizing user provided information is provided. The user provided information is organized 
with, meta-information according to any one of the aforementioned methods with respect to any 
embodiment of the invention. Basically, the method relates to a providing of the storage capacity 
for storing and the providing of retrieval capacity for retrieving the previously stored user 
provided information. 
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The providing of the storage capacity is performed by receiving a request for storing user 
provided information or at least a part of user provided information. Therefore, the request 
includes the part of the user provided information to be stored and referencing information also 
to be stored. The referencing information serves to allow a previous retrieval by the means of the 
5 referencing information. Accordingly, the part of the user provided information included in the 
request and the referencing information is stored in an adequate way. 

The providing of the retrieval capability is performed by receiving a request for retrieving at least 
a part of user provided information. Therefore, the request comprises referencing information 

10 which references (uniquely) the part of the user provided information to be retrieved. The 
retrieval of the part of the user provided information employs the referencing information which 
has been stored during the storing of the part of the user provided information. A request 
response is coded on the basis of the retrieved part of the user provided information and 
transmitted back to the requesting device 

15 . 

According to an embodiment of the invention a software tool method for organizing and 
retrieving of user provided audio information with meta-information is provided. The software 
tool comprises program portions for carrying out the operations of any aforementioned method 
according to any embodiment of the invention when the software tool is implemented in a 
20 computer program and/or executed. 

According to an embodiment of the invention, there is provided a computer program for 
organizing and retrieving of user provided audio information with meta-information. The 
computer program comprises loadable program code portions for carrying out the operations of 
25 any aforementioned method according to any embodiment of the invention when the program is 
executed on a processing device, a computer or a network device. 

According to an embodiment of the invention, a computer program product is provided which 
comprises program code portions stored on a computer readable medium for carrying out the 
30 aforementioned methods for organizing and retrieving of user provided information with meta- 
information according to any embodiment of the invention when the program product is executed 
on a processing device, a computer or network device. 

According to an embodiment of the invention, computer data signal is provided. The computer 
35 data signal is embodied in a carrier wave and represents a program or program code portions 
which, when executed by a processor, causes the aforementioned method for organizing and 
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retrieving of user provided information Awith meta-infonnation according to any embodiment of 
the invention be carried out 

According to an eighth aspect of the invention, a mobile terminal device is provided which is 
capable to organize user provided information with meta-information in accordance with anyone 
of the aforraientioned methods according to embodiments of the invention. The mobile temiinal 
device corc5)rises a component for obtaining (receiving) the user provided information and the 
context information bdng associated with tiie user provided information. Further, a component 
for assigning meta-information being obtained from the context information to the user provided 
information. A component for storing, a storage device, receives and stores the meta-information 
and the user provided information. 

The user provided information is to be understood as data which has been generated, received, or 
which occurs in consequence to any viser operation being performed against the mobile terminal 
device. The user provided information may be generated by employing user input means or may 
be received or obtained via any data communication interface provided by the mobile terminal 
device. It shall be further understood, that the user input means may be an arrangement of keys, a 
keyboard, a keypad or any other manually operable inputting component (joystick, pen etc), but 
also enhanced input means such as an audio sensor (microphone), a digital camera capable to 
take photos and/or image sequences (videos) and further sensor input The data communication 
interface may be any interface able to receive, transmit or communicate electronic data of any 
type witibi a counterpart device, in particular transaction information and/or communication 
information are comprised. The context information is associated with the user provided 
information. It shall be understood, tiiat context information e.g. includes information about a 
device application and/or device function currentiy active and to which the user provided 
information is originally related, position/location information, time information and any further 
information being available in parallel to the user provided information and being related 
(chronologically and/or logically) to the usct provided information. 

According to a ninth aspect of the invention, a mobile terminal device being capable to organize 
user provided information with meta-information obtained from calendar information is 
provided. The mobile terminal device is capable to perform anyone of the aforementioned 
methods according to embodiments of the invention. The mobile terminal device comprises a 
component for obtaining calendar information from an electronic calendar application 
implemented in the mobile terminal device. A component for matching allows to match tiie 
calCTidar information and the user provided information and a component for obtaining meta- 
information allows to obtain meta-information from the matching calaidar information. 
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According to a tenth aspect of the invention, a mobile terminal device is provided which is 
capable to organize user provided audio information with meta-infonnalion. The mobile terminal 
device is capable to perform anyone of the aforementioned methods according to embodiments 

5 of the invention. The mobile terminal device comprises a component for obtaining the user 
provided audio information and for obtaining additional user provided information being 
provided in parallel to the recording and storing of the user provided audio information, a 
component for recording the user provided audio information and a storage component for 
storing the user provided audio information. Further, a component for obtaining context 

10 information is comprised. The context information includes user input information generated in 
consequence on a user action during the recording and storing of the user provided audio 
information. The context information is logged and suppUed to a component for generating meta- 
information comprising information about the additional user provided information being 
obtained and the context information being logged. 

15 

According to an eleventh aspect of the invention, a mobile terminal device being capable for 
organizing user provided audio information with meta-information by embedding meta- 
information into the user provided audio information is provided. The mobile terminal device is 
capable to perform anyone of the aforementioned methods according to embodiments of the 

20 invention. The mobile termiiial device comprises a set of code bases. The code bases represent a 
pluraUty of coding symbols. Each code basis of said set of code bases defines an at least one 
frequency. A component for mapping m^s the meta-information into said a plurality of coding 
symbols and a component for obtaining user provided audio information having embedded the 
meta-information is able to embed the mapped meta-information into the user provided audio 

25 information. 

According to a twelfth aspect of the invention, a storage device capable to provide storage 
capacity for organizing user provided information being provided with meta-information is 
provided. The storage device comprises an interface component for receiving a request for 

30 storing, for receiving a request for retrieving and for transmitting a response in consequence on 
said request for retrieving. The request for storing is a request for storing of at least a part of user 
provided information being transmitted from a mobile terminal device which performs a method 
for organizing user provided information with meta-information according to any one. of the 
claims 1 to 29. This request comprises the part of the user provided information and referencing 

35 information to be stored. The request for retrieving is a request for retrieving at least a part of 
user provided information transmitted from the mobile terminal device which performs a method 
for organizing user provided information with meta-information according to any one of the 



wo 2004/057892 



14 



PCT/ra2002/005563 



claims 1 to 29. That request comprises referencing infonnation allowing to retrieve the requested 
part of the user provided information. 

A storage component is capable to store the user provided information and Hxe referencing 
information such that the user provided information is retrievable in conjunction with the 
referencing information. A component for retrieving enables to retrieve flie part of the user 
provided information which is requested by the request for retrievmg in accordance with the 
referencing information being included therein and a component for generating a response 
prepares the response including the retrieved part of the user provided information to be 
transmitted to the requesting mobile terminal device. 

Invention will be described in greater detail by means of embodiments with reference to the 
accompanying drawings, in which 

Fig. la shows a block diagram illustrating the structural organization and interaction of a 
personal logbook according to an embodiment of the invention; 

Fig. lb shows a block diagram illustrating components allowing to constitute a personal 
logbook according to an embodiment of the invention; 

Fig. Ic shows a block diagram illustrating a suitable device environment which comprises 
further storage device cooperating with the personal logbook according to an 
embodiment of the invention; 

Fig. 2a shows a flow chart depicting an operational sequence for operating and ia 
particular for establishing logged information in a logbook component such as 
illustrated in Fig. lb according to an embodiment of the invention, respectively; 

Fig. 2b shows a flow chart depicting an operational sequence for operating and in 
particular for retrieving information from a logbook component such as illustrated 
in Fig. lb according to an embodiment of the invention, respectively; 

Fig. 3a depicts schematic calendar entries providing a semantic field stmcture for 
inputting information; 

Fig. 3b shows a plot diagram illustrating example membership functions used for 
associating a membership grade to certain events according to an embodiment of 
the invention; 

Fig. 3c shows a flow chart depicting an operational sequence for generating one or more 
labeling / indexing information according to an embodiment of the invention; 
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Fig. 3d shows a plot diagram illustrating example membership function plots 
corresponding to the calendar entries depicted in Fig. 3a and several timestamps of 
different individual multimedia data according to an embodiment of the invention; 

Fig. 3e shows a depiction of context information of a selection of individual multimedia 
data including labeling / indexing information in accordance with the calendar 
entries dq)icted in Fig. 3a and obtained in accordance with an embodiment of the 
present invention; 

Fig. 4a shows a flow chart depicting an operational sequence for recording audio 

information and for associating additional information of any type to the audio 

information according to an embodiment of the invention; 
Fig, 4b shows a sequence of operations in time illustrating an example usage sequence 

according to an embodiment of the invention; 
Fig. 4c shows a possible example description in view of the example usage sequence 

presented in Fig. 4b and according to an embodiment of the invention; 
Fig.4d shows a block diagram illustrating graphically a depiction of a protocol / 

description information according to an embodiment of the invention; 
Fig. 4e shows a table depicting a textual oriented depiction of a protocol / description 

information according to an embodiment of the invention; 
Fig. 5a shows a plot diagram illustrating an example frequency range of an audio 

recording/reproducing system including several complete octaves; 
Fig. 5b shows a set of plot diagram depicting different coding methods according to 

embodiment of the invention; 
Fig. 5c shows a block diagram d^ictirig an embedding of information into an audio 

signal according to an embodiment of the invention; 
Fig. 5d shows a block diagram depicting a recovering of information being embedded in 

an audio signal according to an embodiment of the invention; 

Same or equal parts, features and/or operations shown in the figures will be referred to using the 
same reference numerals. The following description of embodiments of the present invention is 
partitioned into independent sections each of which relates to one certain aspect of the overall 
inventive concept. 

L Personal Logbook 

Users of electronic terminal devices develop personal usage patterns based on their preferences 
and recurring needs. An example being based on mobile communication terminals is tiie "last 
calls" function where a list of phone numbers consisting of a predefined number of put through 



wo 2004/057892 



PCT/IB2002/005563 



16 

telephone calls is presented to users through which the users are aUowed to scroU to select one of 
the presented Ust entries. The Ust of phone numbers is often associated with the personal 
telephone directory estabUshed and edited by the users themselves. Users tend to select one of 
those numbers quite often to initiate a new call. The idea in the case of the phonebook is to store 
the last called numbers and offer them also for fiiture use. 

This concept will be generalized to other usage domains and will be designated as "personal 
logbook" in the foUowing. Fig. la illustrates the structural organization and interaction of a 
personal logbook according to an embodiment of the invention by means of a conceptional block 
diagram. The block diagram depicts three example information streams the information of which 
is passed to the personal logbook 200 for being logged, combined, classified, labeled, indexed 
and provided to be retrieved. The three example information streams include a user input 
information stream 101, a context information stream 100 and a transaction / communication 
details information stream 102. 

The basic input parameters and properties of the personal logbook 200 are user input 
information, result information, e.g. resulting from the processing of the user input information 
and in conjunction with available and suitable context information. Typically, a user mput is 
based on the input means provided by an electronic device such as a PDA, an electronic 
organizer, a mobile communication terminal device and related electronic data processing 
devices. User input means are operable by the user and the electronic device receives user input 
information corresponding to the opraation of the input means! Correspondingly, one or several 
user inputs or input sequences operated by the following ecample input means are suitable for 
being logged: 

- keys, keypad, keyboard: keyboard/keypad input, where the individual character inputs are 
aggregated and which are to be associated with the ^plication context, i.e. the currait 
running device appUcation or the current executed device function to which the 
keyboard/keypad input is passed on for being processed, for being parsed to effect a desired 
operation and the Uke. For example, a user input comprises an aggregated input of a character 
sequence representing a complete text string for a form field whose title is known from the 
appUcation context such as a string which is suppUed to a WAP browser to be parsed and 
interpreted as a URL address information; 

- optical pen: pen input, where the coordinate information may either be passed to an optical 
character recognition first to be translated in a text based information or represent a free 
graphics input being a sketch. For example, PDAs offer pen input capabiUty to users for 
operating the PDAs; 
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- audio detector (microphone): voice input, where the voice input is recorded as a voice memo 
to be reproduced later, e.g. in kind of a dictation machine, or the voice input is passed to a 
voice recognition which returns text information corresponding to the voice input. Such a 
voice recognition may be implemented directly in the electronic device havmg voice 
recognition capability or may be realized as a networked voice recognition service accessible 
via wireless or wired communications.; 

- digital image receiver (CCD camera): image / video input, where images or image sequences 
(video) are stored directly or where only symbols are stored which are gained fix)m the image 
/ video input and represent recognized gestures; 

- sensors: additional sensor input, such as acceleration sensor information etc; 

- and further user inputs which can be received from input receiving means of the electronic 
device and which are suitable to control functions and/or applications thereof. For example 
further adequate mput means may be any user operable input means such as a joy stick, touch 
pad etc. 

Naturally, the user input information stream 101 is limited to the aforementioned information 
being received in consequence on a user input. The user input information stream 101 may be 
understood as user input information being obtained (received) in consequence on any user 
operation against the electironic device which also includes any information obtained (received) 
via any interface providing data exchange capabiUties with any other counterpart device. 

The context data/information is represented by additional information and conditions under 
which the user input is carried out or is obtained from the current operation state in which the 
electronic device is currentiy active. The context information may comprise: 

- position information of the device (if available), where the position information may be 
conveyed from a radio access network (location service enabled radio access network) or 
from a satellite position system (global position system) 

- time information of activity, i.e. information about tiie time at which tiie user input is 
received, information about tiie duration of time during which the user mput is received, 
information about the time at which the user input starts/stops and related time information; 

- information about active appUcation which has input / output focus during the user input, 
which receives the user input and more detailed to which feature of the active appUcation the 
user input is dedicated: 

- environmental details (if available) Uke e.g. temperature, humidity / weather conditions, 
lighting conditions; 

- information about the mood of the user (if available); 



wo 2004/057892 PCT/IB2002/005563 

18 

- and all other information which can be obtained by means of the electronic device and which 
are suitable as context-defining parameters. Especially, infotmation available in conjunction 
with appUcations being operated on the electronic device such as calendar application, 
telephone directory, address book appUcation and related appUcation being employed for 
organizing dedicated information. 

The transaction / communication details information is the third information stream 102 to be fed 
to the personal lo^ook 200. The transaction / communication details information may comprise 
the following information: 

- identity information about the sender of a message or a call which has been put through, e.g. 
calling phone number, vCard of sender, mood of sender (cf context information); 

- information about E-commerce transaction details like recipient, (bank) account information, 
order description and pricing; and further 

- status information about transactions including tracing information of transactions such as 
information about an order life cycle like order placed - order acknowledged / accepted - 
delivery started - deUvery on hold (e.g. a download) - deUvery resumed - delivery finished - 
order billed - order paid; 

The transaction / communication details information may be regarded as being part of the context 
information. 

The presented classification of available information into three different information streams is 
be understood as an example embodiment according to the concept of the present invention to 
enlighten a possible but not neciessary classification of the information being logged by the 
personal logbook 200. It is to be noted, that the classification of information streams which 
appear in conjunction with the operation of an electronic device and which are capable to be 
logged into different information streams may be difficult, since certain information may be 
associated simultaneously with different information streams. The presented concept is also 
operable in conjunction with a single information stream including one, several selected or all 
available information. Further, the single information parts assigned to one of the presented 
example information streams represent single example information parts according to an 
embodiment of the invention. The inventive concept shall not be understood as being limited to 
these mentioned single parts of information. 



The electronic device bemg in control of these information streams 100, 101 and 102 siq>plies 
the information streams to the personal logbook 200. Conceptionally, the task of the personal 
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logbook 200 according to an embodiment of the invention may be split into three different 
subtasks: 

_ Obtaining / receiving and logging user input information, context information, transaction 
information and furttier available information and processing the logged information for 
being stored. 

This subtask may be performed automatically without requiring further user activities, i.e. 
siich a logging operation may be a background activity. The information being obtained / 
received and logged may be determined by pre-defined and/or user-defined regulation 
defining the kind of information to be obtained / received and logged. 

- Maintaining and sorting information stored in the personal logbook in accordance with user- 
specific criteria or system specific criteria. This subtask may be also a background activity 
and the maintaining and/or sorting criteria may be pre-defined by the user . 

- Retrieving information in consequence on one or more retrieving instructions specifying 
certain filtering and/or sorting criteria to be applied for the retrieving of information firom the 
personal logbook 200 and providing (presenting) the retrieved information. The one or more 
retrieving instructions may be transmitted firom one or more appUcations 300 to flie logbook 
200 and the logbook 200 transmits the retrieved information back thereto. 

The operation of the personal logbook 200 can be expressed in short by the following 
description. The personal logbook 200 automatically obtains / receives and stores all user input, 
the appUcation context, location context and time context where this happened and all 
communication-related information &om outside. The mfonnation is sorted in accordance with 
different (pre-defined/user-defined) criteria to facilitate a retrieving access to the stored 
information by the user. For example, the user is able to trace his or her input / transactions since 
follow-up activities are most likely based on this context. 

In view of the above described capabilities of the personal logbook 200 the advantages with 
respect to the infoimation overload dilemma can be recognized. The technical pre-conditions to 
overcome the infoimation overload dilemma are provided by the logbook which allows to store 
all infomiation which are encoimtered in daily life of a user, startiag from documents, through 
audio recordings up to fiill-day videos of user's life. This information and especially multimedia 
information may be useless without the implementation of the person logbook 200 which allows 
to organize this information automatically or at least semi-automatically. Otherwise the user 
cannot concentrate on the relevant content and use it for the fiiture. 
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Fig. lb iUustrates components aUowing to constitute such a personal logbook 200 depicted in 
Fig. la according to an embodiment of tbe invention by means of an example embodiment of the 
personal logbook 200. The personal logbook 200 may be implemented by the means of an 
information receiving interface or information obtaining component 201, a labeling/indexing 

5 component 202, a referencing component 203 and a logbook storage 210. These components 
201, 202, 203 and 210 relate to the first and second subtasks which are mentioned above in view 
of the principle operation of the personal logbook. The depicted logbook storage 210 may be 
supported by an extemal logbook storage 230 which may be connected to the personal logbook 
200 and the logbook storage 210 by a storage management component 213, respectively, which 

10 allows to evacuate information thereto and to retrieve information therefrom. Moreover, the 
logbook storage 210 may be completely substituted by the extemal logbook storage 230 being 
operated as the only storage component for storing logbook related information. 

Further, the personal logbook 200 includes components which allow to retrieve information from 
15 the logbook storage 210 and/or, if present and employed for storing, from the extemal logbook 
storage 230. The request interface 220 is able to receive one or more requests for information 
from e.g. an application which has generated an information retrieving request for example in 
consequence to a user input. Such a request includes one or more commands instructing to 
retrieve certain information e.g. in accordance with one or more retrieving, sorting and/or 
20 filtering criteria. A logbook access component 221 is responsible for the access to the logbook 
storage 210 as well as to the possibly available extemal logbook storage 230. A response 
containing information from the logbook storage 210 and extemal logbook storage 230, 
respectively, corresponding to the request for information is generated in cooperation with the 
logbook access component 221 and the response interface 222 to be transmitted to one or more 
25 appUcations which are capable to received the response or to be returned to the appUcation which 
has initiated the request for information, respectively. 

Fig. Ic illustrates a suitable device environment which comprises storage solutions / storage 
components and storage devices cooperating with the personal logbook according to an 

30 embodiment of the invention. The aforementioned extemal logbook storage 230 may be reaUzed 
in a large number of ways. The depicted device environment shall illustrate a couple of 
embodiments of the extemal logbook storage 230 without limiting thereto. The device 
environment comprises an electronic device 10 having implemented the personal logbook 
according to an embodiment of the invention, a detachable connected storage component 11, an 

35 extemal storage device 12 being coupled to the elecfronic device 10, a local extemal storage 
solution 13 and a networked extemal storage solution 17 each offering storage space/capacity for 
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being used in conjunction with an embodiment of ttie personal logbook according to the 
invention. 

The detachable connected storage component 11 can be realized technically in form of any 
memory cards (SD card, MultiMedia card, ...) being based on FLASH memory, magnetic 
memory, ferrite memory etc., which are today employed in mobile electric device products or in 
form of any once-writable or re-writable storage media, such as optical recording media (e.g. 
CD-R, CD-RW, a (once-/re-)writable DVD media according to any DVD standard), magnetic 
recording media and the like. Any known mass storage media are suitable for be used. 
Correspondingly, such a mass storage technology may be employed as an external storage device 
12 being coupled to the electronic device 10 via a wired or a wireless coimection. Altematively 
to a dedicated storage component 11 or a dedicated external storage device 12 a processing 
device 13 having implemented a mass storage component such as a desktop computer or a 
notebook may be employed in agree with the illustrated and described extemal logbook storage 
230. Analogously, a wireless or wired communication coimection provides communication with 
such a processing device 13. Additionally, a networked storage device 17 connected via a 
wireless or wired communication network 15 to the electronic device 10 may provide a dedicated 
storage space for being employed as extemal logbook storage 230. 

Any of the wired and/or wireless connection technologies may be used for realizing 
communication of the electronic device 10 with one of the illustrated extemal devices 12 and 13. 
For example, the universal serial bus, the firewire bus or any other serial but also parallel bus 
maybe used for wired coimection, whereas Bluetooth, wireless local area network, wureless local 
loop, infrared communication and related wireless communication standards allow to realize a 
wireless communication 14. 

Such a networked storage solution as illustrated in conjunction with the networked storage 
device 17 may be provided by a networked storage service which may be accessed via the 
intemet 18, a wide area network, a local area network, a public land mobile network 16 (such as 
GSM, PCN, DCS, PCS, UMTS, etc.) or any combination thereof. The networked storage device 
17 has the advantage that the provided storage capacity may not be limited due to any storage 
capacity limitations as given in conjunction with detachable storage media and the provided 
storage capacity may be always available (on condition that a communication link to the 
networked storage device 17 is available). 

A more detailed description of the operations of the personal logbook 200 and the components 
comprised in the personal logbook 200 will be discussed in combmation with the following 
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operational sequences relating to logging / maintaining of the personal logbook 200 and the 
retrieving of certain logged information from the personal logbook 200. 

Fig. 2a shows a flow chart depicting an operational sequence for operating and in particular for 
establishing logged information in a logbook component such as illustrated in Fig. lb according 
to an embodiment of the invention, respectively. 

In an operation SlOO, the logging operation of the personal logbook 200 is started. 

In operation SI 10, operation S120 and operation S130, user input information, context 
information and/or transaction information / communication details information is obtained and 
received, respectively. In view of the block diagram the personal logbook 200 and in particular 
the information receiving interface 201 receives this set of information. In accordance with the 
current operation and/or the operational state of the electronic device having implemented the 
personal logbook 200 only a part of the aforementioned information or additional information 
maybe available for logging. The kind of information to be logged by the personal logbooks 200 
may be pre-defined or user-defined and all available information in accordance with the (pre-, 
user-) defined logging regulations are logged. 

In an operation S140, the received information including at least a part of user input information, 
context information and/or transaction information / communication details information is 
combined to a set of information. That means, timely simultaneous received information, 
logically related and/or logically associated information and the like is taken into relationship and 
formed to a set of information resulting in an information record. Received user input 
information is associated with received context information by considering time information or 
simultaneously receiving thereof respectively. Received communication details information 
and/or transaction information are associated with received context information by considering 
time information or simultaneously receiving thereof, respectively. Moreover, communication 
details information and/or transaction information may be received in a time interval and may be 
related by logical relationship. 

In an operation SI 50, the resulting information record is labeled and indexed, respectively, i.e. is 
provided with labeling information and indexing information, respectively. The labeling 
information and indexing information is vised for retrieving the information record or parts of the 
information record, e.g. subsequently in consequence on an information request. The labeling 
and indexing information may be a set of individual meta-information, respectively, extracted 
from the information record and aUowing to retrieve, select and/or evaluate the labeled and/or 



wo 2004/057892 



PCT/IB2002/005563 



23 

indexed infonnation record in a useful and effective maimer. The labeling and indexing 
information themselves are obtained (extracted) from the information record by employing 
individual information from the user input mfonnation, context information and/or tramaction 
information / communication details information. The labeling information may refer to a certain 
device operation in conjimction to which the information record has been formed from the 
received information, i.e. the labeling mformation may indicate or designate the device 
iqpplication or device function m consequence to which the infonnation has been received and to 
which the received information relates. 

For example, a number is dialed by the user of a mobile conmumication enabled temmial device 
having embedded the personal logbook 200 according to an embodiment of tiie invention. The 
user input information may include either a sequence of numbers representing the dialed 
telephone nvunber or a name entry from the personal telephone directory associated with a 
coixesponding telephone number. As context infonnation the moment of initiating and 
termination of the telephone call, the duration of the telephone call, tiie cunent date, the current 
geographical position and further available context information are associated with this user 
input information. Correspondingly, an information record may be formed from this information 
and a label mformation "telephone number" or "telephone call" may be assigned to this 
information record wherein that label information may be extracted from the context infonnation 
providing information about the current focused application, herein the telephone call 
application. Further, parts of the available context infonnation may be employed as indexing 
information. A consecutive retrieving operation may instruct to deliver all telephone calls havmg 
been logged, or all telephone calls having been logged on a certain date, within a certain period 
of time, at a certain geographical position and the like such that flie indexing mformation acts as 
information enabling a filtering of retrieved information or analogously a sorting of retrieved 
information, i.e. sorting by time/date, duration of time, geographical position, etc. 

Further for example, a user of a mobile cormnunication enabled terminal device having 
embedded the personal logbook 200 according to an embodiment of the invention accesses the 
internet by the means of a WAP (wireless application protocol) browser for paying a hotel bill by 
electronic payment employing the service of an e-commerce service provider. The personal 
logbook receives user input information corresponding to the WAP-browser usage, e.g. amongst 
others a URL (universal resource locator) of the service provider and viser input mformation of a 
form filled out by the user to order the payment such as address and account information about 
the hotel. The user input information are analogously accompanied by context information such 
as the aforementioned moment of initiating and termination of the use of the WAP-browser, the 
current date, the current geographical position and further available context infonnation to be are 
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associated with the user input information. AdditionaUy, transaction information is also available 
and received by the logbook, where the transaction information may comprise a (unique) 
transaction ID (identification sequence) and individual information about the addressee, account, 
amount etc. The resulting information record may be labeled as "transaction" and indexed with 
selected individual context information and/or individual transaction information. 



Alternatively, the received information may be combined in two information records, one 
relating only to the WAP-browsing usage and the other relating to the transaction. Both 
information records are formed in combination with the aforementioned context information. 
10 Correspondingly, the first information record may be labeled "WAP browser" and the second 
information record may be labeled "transaction". 

Further transaction information may be received in consequence on the initiated transaction. This 
transaction information are logged in conjunction with context information. The aforementioned 

15 transaction ID may be comprised in all transaction information may be received in consequence 
on the initiated tiransaction such that a creating of an indexing information based on the 
transaction ID allows a user to retiieve all ti-ansaction information relating to one and the same 
transaction procedure even in case the transaction information are logged at different moment of 
times. Alternatively, all tuition information logged at different moment of times may be 

20 combined into on information set by a logical relationship bemg based on the unique tiransaction 
ID. 

In an operation S160, the information record is referenced- In terms of a logic separation the 
personal logbook 200 can be completely separated fixjm the logbook storage where the actiial 

25 information records are stored, hi this case of a separation the personal logbook 200 stores only 
relationships between different instances of information records and pomts to the actual logbook 
storage in order to allow information record retrieval. The referencing of the information record 
allows to determine linking mformation suitable for retiieving the information record firom the 
actual logbook storage such as an extemal logbook storage 230. More precisely, the linking 

30 mfonnation, labeling information and/or the indexing information are stored in the personal 
logbook 200 itself, i.e. in the logbook ID storage 212, which allow a useful retiieving, filtering, 
selecting and/or sorting of the stored information records whereas fiirther information associated 
with the information stored in the logbook ID storage 212 of the logbook 200 is evacuated to the 
actual logbook storage, i.e. to the extemal logbook storage 230 and more precisely to the logbook 

35 information storage 232 outside of the personal logbook 200. 



wo 2004/057892 



PCT/IB2002/005563 



25 

In practice, the above proposed hard separation of the personal logbook and the actual logbook 
information storage might elaborate, instead only larger portions of information such as audio 
and video cUps might be moved to the external logbook storage 320 and more precisely in the 
external logbook information storage 232, respectively, but small portions of information such as 
textual information is kept in the personal logbook 200, the logbook storage 210 and more 
precisely in the logbook information storage 211, respectively. 

If necessary, the linking of information stored in the logbook storage 210 of the personal logbook 
200 and stored m the external logbook storage 230 coupled to the personal logbook 200 may be 
established and realized by the aforementioned referencing information being stored in a 
dedicated storage area, i.e. in the logbook ID storage 212 of the logbook storage 210 included in 
the personal logbook 200 and the corresponding counterpart logbook ID storage 231 of the 
external logbook storage 230. The storage management component 213 manages the information 
exchange between the logbook storage 210 embedded in the personal logbook 200 and the 
external logbook storage 230. 

the proposed concept to sepsxaXe information organization (the personal logbook) from 
information storage so that also the logbook information alone is valuable for the user, whereas 
(memory-intensive) content is handled separately. While logging of basic information can 
aheady be implemented on electronic devices (terminals) with modest memory budget, the full 
benefits of a separation of information and its organization are apparent in electronic devices 
(terminals) with mass storage capabiUty, such as an optical re-writable storage medium (re- 
writable computer disk (CD), digital versatile disk (DVD) etc), a magnetic storage medium (hard 
disk etc), a permanent memory medium CFLASH memory, magnetic memory, ferrite memory 
etc), a network based or internet based storage solution and the like, which is connected locally 
(implemented in or detachably connected with the electronic device) or connected via a 
communication network to the electronic device. 

As an example, it may be assumed that a user of a mobile commimication enabled processing 
device (such as a PDA or an electronic organizer with a communication adapter or a mobile 
communication terminal device) having implemented the personal logbook according to an 
embodiment of the invention downloads a video presentation (slide show, pictures, large 
electronic document). The downloaded information is logged by the personal logbook in 
conjunction with the corresponding context information, that means that the complete video 
presentation (slide show, pictures, large electronic docmnent) is present in the logbook storage. 
Especially, mobile processing device are commonly equipped with limited storage capacity due 
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to their mobility (size, weight) and usability. The evacuation of such infonnation to an externally 
coupled storage with high storage capacity may be necessary. 

In an operation S170, the inforaiation is stored. The storing of the infomiation record and 
labeling, indexmg and/or referencing infonnation is perfomied in accordance with the above 
described storage solution chosen for the personal logbook 200. The labeling, mdexing and/or 
referencing information as well as the infonnation record may be organized in a data base which 
allows to access the labeling and mdexing infonnation in an efficient way for identifying one or 
several infonnation records which match with one or more retrieval instmctions. 

Since the amount of information can be enormous, especially in view of multimedia information 
such as video clips , audio clips requiring extensive data storage capacity, a policy / regulation of 
data conservation and deletion has to be in place. 

Independent from the chosen storage solution the total storage capacity is always finite such that 
a policy / regulation what information to keep and what infonnation to discard is required. By 
default a life time can be associated with all information (records), meaning in practice that the 
oldest infonnation are deleted first. The deletion of timely expired infonnation may be operated 
automatically at the moment of expiration or at the moment of requiring storage capacity which 
is occupied by expired information and which is required for storing newly infonnation. 
Alternatively expired or older information can be moved to an archive medium which can be 
embodied analogously to the aforementioned external storage solution for clearing up tiie 
logbook storage capacity. An archive medium may be a mass storage of a personal computer 
(PC), a removable storage medium or a network-based storage solution being connected 
detachably with the electronic device and the personal logbook for archiving purposes. The same 
policy / regulation can be applied to both, tihe logbook, the logbook storage and/or the extemal 
logbook storage. 

In an operation SI 80, the logging infonnation with respect to a certain set of user input 
information, context infonnation and/or transaction / commxmication details information is 
finished. The logging operation embodied as the presented operational sequence being one 
possible embodiment according to the invention may be operated as a background process of the 
electronic device having implemented the personal logbook. As soon as a set of information to be 
logged is received by the personal logbook the aforementioned logging operation is performed by 
the electronic device and the personal logbook, respectively. 
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A personal logbook is most appealing to a user if the collected information can be utilized with a 
broad range of appUcations. The critical operation of the above described logging process is the 
labeling / indexing operation S150 since only a useful labeling and indexing ensures the potential 
of the logbook information allowing for managing efficiently and usefully the huge amount of 
collected information. 

Fig. 2b shows a flow chart depicting an operational sequence for operating and in particular for 
retrieving information from a logbook component such as illustrated in Fig. lb according to an 
embodiment of the invention, respectively. 

In an operation S210, the retrieving operation for retrieving information from the personal 
logbook 200 is started. 

In an operation S220, a request for retrieving information from the personal logbook is received. 
Such a request may be received by the aforementioned request interface 220 of the personal 
logbook 200. 

In an operation S230, the one or sevaral information records are retrieved from the personal 
logbook 200. The request includes instructions which specifies the kind of information to be 
retrieved from the personal logbook 200. The request specifies a certain designation or 
combination of designations in accordance with which previously stored information records 
may be selected. In view of the labeling and/or indexing operation S140 referred to in Fig. 2a for 
example, the instructions command e.g. the logbook access component 221 to retrieve 
information records associated with a certain labeling information, associated with a certain 
indexing information and/or any combination of information thereof 

In an operation S240, the retrieved information records matching with the instructions for 
retrieving which are included in the request may be further sorted in accordance with additional 
instructions comprised by the request. The sorting of retrieved information records may 
analogously be based on a certain designation or combination of designations. In view of the 
labeling and/or indexing operation S140 referred to in Fig. 2a for example, the instructions 
command e.g. the logbook access component 221 to sort the retrieved information records in 
accordance with a certain associated labeling information, a certain, associated indexing 
information and/or any combination of associated information thereof. 

The operation S230 and the operation S240 may be a combined operation such that the 
retrieving, filtering and/or sorting is operated in a combined operation. 
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In an operation S250, the retrieved, filtered and/or sorted information records are transmitted to a 
certain application executed on the electronic device. The addressed plication maybe coded in 
the request. 

In an operation S260, the retrieving operation for retrieving information firom the personal 
logbook 200 is finished. 

hi the operation S230, the retrieving of information in accordance with instructions comprised by 
the request may involve the retrieving of information firom an external logbook storage acting as 
evacuation storage. The selection of information records which are matching in accordance with 
the information request may be performed in conjunction with the logbook ID storage 212 
including labeling information, indexing information and/or linking (referencing) information. 
The actual information are retrieved by employing the corresponding linking (referencing) 
information from the evacuation storage, i.e. the external logbook storage 230 including a 
corresponding logbook ID storage 231 and a logbook information storage 232 including the 
information to be retrieved. 

For example in view of a mobile communication terminal device having implemented tiie 
personal logbook according to an embodiment of tiie invention, a request for retrieving 
information may be generated and transmitted to the personal logbook by user selection of a 
fixed device feature or application feature such as the command (operation mam item) for 
retrieving the last dialed telephone nimibers, the last received telephone calls or the last missed 
telephone calls. The generation of a corresponding request for retrieving information about the 
last dialed telephone numbers, the last received telephone calls or the last missed telephone calls 
from the personal logbook may be initiated by selecting a certain application fimction or by 
operating a certain dedicated key of the mobile communication terminal device. Since the 
logbook logs all user input information, context information and communication details 
information, the corresponding req)ective information to be retrieved is present in the personal 
logbook and are presented to the user correspondingly. Similarly, a history list of last transactions 
and the like may be requested by the user and finally presented correspondingly to the user. 

For example in view of an electronic device having implemented the personal logbook according 
to an embodiment of the invention, a request may comprise a user inputted text string (e.g. a 
name, an address etc) in accordance to which information records are to be retrieved which 
include a matching text information. This example can result in a response including a plurality 
of different information records, such as an information record relating to a dialed telephone 
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number, an infonnation record relating to a calendar entry, an information record relating to an 
editing of the personal telephone directory and further information records containing the 
inputted text string. In order to retrieve matching information records from the personal logbook 
the inputted text string may be compared with the labeling and/or indexing information 
associated to the infonnation records or may be compared with the whole information content of 
the infonnation records. 

In view of an electronic device having implemented the personal logbook according to an 
embodiment of the invention, a request may comprise information being associated to an 
^plication context. For example a user currentiy scrolls through the personal telephone directory 
and discovers a name to which he wants to retrieve further information. On user instruction, a 
request for retrieving information from the personal logbook is generated, which comprises for 
example the designation of the discovered entry of the personal telephone directory. 
Conespondingly, the personal logbook serves all information logged and stored therein which 
match with the provided designation extracted from the current applicational context. The 
generation of logbook information request in accordance with certain applicational context is 
also applicable to other applications and/or device functions having an analogous context. 

The retrieved and finally presented logbook information in consequence on a request may be 
filtered or sorted in a pre-defined or user-defined way. History lists are usually sorted 
chronologically (backward) and limited to a certain number of entries (e.g. ten entries). 
Alternatively, the sorting of the retrieved information records may be based on a usage frequency 
information, fhsA means, that infonnation records associated with a low usage frequraicy are 
sorted at the end of a resulting sequence of information records or are filtered out. Furtho: sorting 
and filtering procedures are possible since the context information offers a huge numba: of 
individual information allowing an effective and useful sorting and filtering. 

The personal logbook may be based on a data base technology for organizing the described 
storing structure. Moreover the coding of the labeling / indexing information may be based on a 
device independent markup language coding. Suitable coding technology for the data being 
managed by personal logbook may be a coding technology ia accordance with the extended 
markup language PCML) or binary encoded XML in conjimction with a corresponding document 
type description (DTD) and related markup language representations. The content and 
presentation / organization of the logbook data may be structured separately and the content can 
be referenced by an addressing information (referencing infonnation) Uke provided with a 
imiform resource locator (URL), uniform resource indicator (URI) and comparable resource 
addressing schemes. 
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The following sections will relate to diffwent operations performed by tiie personal logbook and 
relate to certain user provided information to be logged for being provided to be retrieved. A first 
detailed description will relate to labeling and/or indexing user provided information with 
information obtained firom an electronic calendar and firom calendar entries, thereof. A second 
detailed description will relate to labeling and/or indexing user provided audio information with 
information being obtained J&om the user actions being operated during tiie audio recording and 
provided by user actions during the audio recording. A third detailed description will relate to a 
method for embedding information of any type into a user provided audio information. 
Especially, the method defines a new type of markup language encoding being based on a 
language encoding system and alphabet for embed characters (symbols) of any mark-up language 
around or within the audio clip. 

n. Context Labeling OF MuLTiMEPiA Information 

As aforementioned in view of the personal logbook but also in regard to common retrieval 
systems for information and more particular multimedia information (data) retrieval systems such 
systems have to make use of reUable attributes (herein denoted also as labeling information 
and/or indexing information) in order to allow reliable retrieval and provide for usefiil results. 
However, most of the attributes that can be extracted automatically firom multunedia information 
such as video or audio sequences may be either less significant &om a perspective of a user - 
like, e.g., colors, textures, etc. - or less reUable - like, e.g., persons, objects, etc. 

Existing multimedia information retrieval systems being state of the art exhibit several 
disadvantages. In case of a manually labeling of the multunedia information high reliability may 
be achieved, where high reliabiUty shall specify a high precision of retrieval results which is of 
course needed and desired. But the manually labeling forces the user(s) of such a multimedia 
information retrieval system to maintain reUably the labehng with usefiil labeling information 
comprehensible and employable even for fixture use. Alternatively, the labeling of the multimedia 
information may be based on an automatic data analj^is, such as detection of persons within the 
multimedia information, e.g., face recognition or speaker (voice) recognition, which is a 
demanding process requiring high data processing capability which may not be available. In 
addition, good visual models or auditory pattem are needed for all the persons that are to be 
recognized visually or auditorily. Thus, persons that are not contained within some model/pattern 
base caimot be recognized. Moreover, multimedia information that is considered as being 
relevant with regard to a specific person is restricted to those information objects that directly 
contain this person, i.e., its face, its voice in case of referring to the automatic recognition. 
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Therefore, the present invention relates further to a method for labeling information and more 
particular multimedia information with calendar data resulting in both highly reliable and cost- 
efficient attributes appUcable for retrieval thereof. The inventive concept makes use of calendar 
entries in order to label / index information. This kind of labeling / indexing is supposed to be 
reliable because tihie context information obtained therefiom has been contributed by the user 
manually itself and the employing of this manually entered context information is cost-efficient 
since this tapping of user knowledge is only targeting information that has ahready been stored by 
user within its personal calendar. 

The following description of labeling / indexing information will be given with respect to 
multimedia information which may primarily include images, pictures, legacy documents, slides 
sequences, video sequences etc which have common that the extracting of reliable information 
for labeling and/or indexing is difficult or even impossible. For the above given introduction is 
should be understood that even the description refers to multimedia information the inventive 
concept is not limited thereto since it is applicable to any information. 

The usage of electronic calendars is wide spread due to the advantages which such electronic 
calendars provide to users. Electronic calendars offers an editing, modifying, deleting of entries 
as often as required and/or desired without having any negative impact on the overall appearance. 
The calendar entries allow to define alerts triggered by time and/or date. Further, calendar entries 
may be inter-linked and intra-linked such that information may be passed from or to other 
j5)plications such as contact information included in a calendar entry and being inter-linked with 
a personal telephone directory for selecting fast and easily a certain telephone directory entry and 
vice versa. 

The employing of calendar information for labeling / indexing multimedia information requires 
certain perquisites relating to the semantic structuring of the calendar information as well as 
context information relating the multimedia information to be labeled / indexed. 

All entries within the electronic calendar associated to a certain event should be semantically 
structured. As part of the semantic structure for calendar eaitries the following fields appear to be 
useful for the labeling of multimedia information: 

- "Date": This field shall contain a date at which the event takes place; 

- "Time" : This field shall contain a moment of time or a period of time at which the event takes 
place; 
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- "Who": This field shall contain the participants in the event in addition to the owner of the 
electronic calendar; 

- "Where": This field shall contain information about a location at which the event takes place; 

- "Notes" : This field shall contain any firee text notes; 

- "Type": This field shall contain mfonnation about the type of the event such as a meeting, a 
phone call, a telephone conference, a leisure activity etc; and 

- "Importance": This field shall contain a rating information about tiie importance of the event. 

The presented semantic structure of calendar entries is just an example. The inventive concept 
primarily requires date and time information of calendar entries which are available 
independently from the specific implemented semantic structure due to the concept of the 
electronic calendar itself The semantic structure of calendar entries relates to the labeling / 
indexing of the multimedia information. The labeling and/or indexing information finally 
associated witii the mvdtimedia information is extracted firom the calendar entries such that the 
proposed semantic structure ensures that the extracted labeling and/or indexing information are 
reliable and sensible. The proposed semantic structure of calendar entries may be varied, 
modified or adapted to certain conditions without departing the scope of the inventive concept. 

Fig. 3a depicts two schematically structured calendar entries #i and #j. The calendar entries 
include the above purposed semantic fields for structuring reliably the information comprised by 
the calendar entries. The first example calendar entry #i specifies a meeting (field "type") near 
the north pole (field "where") with Santa Claus and Rudy "Rednose" Reindeer (field "who") a 
Friday, December 6*, 2002 (field "date") starting at 6.00 pm and ending at 7.00 pm (field 
"time"). The calendar entry #i also includes a free text note (field "notes"). The second exanq)le 
calendar entry #j specifies a telephone conference (field "type") with Jane (field "who") a Friday, 
December 6*, 2002 (field "date") starting at 7.10 pm and ending at 7.25 pm (field "time"). The 
calendar entry #j also includes a firee text note (field "notes"). 

In view of the example calendar entries #i and #j it can be recognized that the purposed semantic 
stracturing of calendar entries is very usefiil since the semantic structuring ensures that 
information extracted fixjm calendar entries and be used for labeUng / indexing multimedia 
information represent reliable information guaranteeing high retiieval precision in case that a 
retrieval is based on such information. For example multimedia information may be labeled 
and/or indexed with name information which can be extracted reliably from the "who" field or 
multimedia information may be labeled and/or indexed with place information which can be 
extracted reliably firom the "where" field. 
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It shall be noted that this kind of extracting information from semantically structured calendar 
entries to be used for labeling / indexing can be transferred to any other structuring of calendar 
entries. 

A point of time or a period of time has to be allocable to the multimedia mfoimation to be 
labeled / indexed. For example a personal collection of multimedia information are provided and 
available that reflects the personal recordings of reality of a user. This collection may includes 
continuous multimedia recordings, automatically taken multimedia snapshots or manually 
triggered multimedia snapshots v/hece snapshot is used as a general term for performing 
recordings of any type of multimedia information comprising images, pictures, picture 
sequraices, video sequences, legacy documaits and the like. Timestamps are attachable to each 
entry of the personal collection of multimedia information where the timestamps may specify a 
moment of time or a period of time at which the recording has been performed. 

The inventive concept is being based on the main idea to associate calendar entries and 
multimedia information by matching time information provided by the calendar entiies with the 
time information (timestamps) associated to the multimedia information. A matching of the time 
information obtained therefrom allows to associate one or more calendar entries to one or more 
multimedia information sequences and particular to assign information extracted from the 
matching one or more caleiidar entries as labeling / indexing information of one or more 
multimedia information sequences. In other words, the electronic calendar is used to label / index 
naultimedia iriformation in a retrospective way. 

Thus, the mxiltimedia information is not only labeled with a specific labeling information where 
the labeling information is clearly assignable to the multimedia information, but also information 
that is extractable from the one or more matching calendar entries and that is related in a general 
sense. 

In order to provide for a better accuracy for this kiad of labeling, some assignment of so-called 
membership grades (cf Fuzzy Set theory) is sensible where the membership grade depends on 
the specific time dependency of a recording within the timefirame defined by a calendar entry. 

Fig. 3b shows a plot diagram illustrating example membership frmctions used for associating a 
membership grade to certain events according to an embodiment of the invention. The abscissa 
comprises time values whereas the ordinate comprises membership grades. For example, a 
scheduled meeting begins at the moment of time "start" and ends at the moment of time "stop" 
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The period of time between "start" and "stop" is the timeframe of this meeting. Both the moment 
"start" and "stop" are indicated in the plot diagram shown in Fig. 3b. 

Two example membership functions are assigned to this timeframe. The membership function 1 
and the membership function 2 both start rising from a membership grad value zero at a first 
distinct moment of time before the beghming of the timefiame of the calendar entry. 
Consecutively, the membership functions rise with different gradients up to the maximum 
membership grade value one before being valid within the timeframe for certain different periods 
of time. Subsequently, the membership functions drop again with different gradients and become 
zero at a second distinct moment of time after the raiding of the timeframe of the calendar entry. 

The rismg of the membership function before the actual beginning of the timeframe and the 
dropping of the membership function after the actual ending of the timeframe ensures that 
multimedia information are associated with the respective calendar entry to which the timeframe 
belongs in case that the timestamp of the multimedia information is within the timeframe and is 
in the projecting periods of time defined by the exceeding definition area of the membership 
function in respect to the timefirame of the respective calendar entry. Examples will be given 
below to clarify this. 

For simplicity the rising point (designated above as first moment of time) as well as the reaching 
point (designated above as second moment of time )of both membership fimctions 1 and 2 are 
chosen identical and the membership functions are defined as sectionalized linear curves . The 
rising and reaching point as well as the complete shape of the membership function is not only 
limited to the depicted example ones but any sh^e mcluding to points of onset and termination 
is possible. The implemented possibility distribution should be based on the habits of the user 
who is the owner of both the multimedia information and the electronic calendar. However, the 
employed distribution has to be adjusted manually only once. Afterwards this adjusted 
distiibution is valid for all multimedia information having been recorded in the past and will be 
recorded in the future and calraidar aitries, respectively. 

Fig. 3c depicts an operational sequence for generating one or more labeling / indexing 
information of multimedia information according to an embodiment of the invention. 

Basically, the conceptional idea can be reduced to a selection of operations of the operations 
depicted in Fig. 3c. The object to be reached is to label and/or index multimedia mformation with 
one or more calendar entries and information extracted from the one or more calendar entries, 
respectively. 
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In a first step a timestamp is determined which is associated with the midtiinedia information to 
be processed. This timestamp is compared with the available calendar entries and the matching 
ones are selected thereof. Membership grades are obtained from the aforementioned membership 
function assigned to the timeframes of the matching calendar entries, where each membership 
grade relates to each matching calendar entry. Finally the multimedia information is labeled 
/indexed with the one or more matching calendar entries and information extracted from the one 
or more matching calendar entries are used for labeling / indexing the multimedia information, 
respectively. 

The following description of an operational sequence relates to more detailed description of the 
method for labeling multimedia information according to an embodiment of the invention. 

In an operation S300, the method for labeling a multimedia information is started. 

In an operation S3 10, a moment of time or a period of time is obtained which relates to a 
recording event of the multimedia inforaiation, respectively. In an operation S3 15, the moment 
of time or the period of time associated with the multimedia information may be extracted from 
the multimedia information or may be retrieved from information being provided with or being 
assigned to this multimedia information, respectively. 

In an operation S320, it is detected whether a moment of time or a period of time is associated 
with the multimedia information, hi case of a moment of time being determined to be associated, 
the operational sequence is continued with an operation S325 whereas otherwise, i.e. in case of a 
period of time being determined to be associated, the operational sequence is continued with an 
operation S355. 

In an operation S325, the total number of calendar entries available is searched to find matching 
calendar entries. That is, the obtained moment of time is compared with the timeframes and the 
extended timeframes of the calendar entries, respectively, where an extended timeframe is 
defined by the period of time in which the used membership function assigned to a timeframe of 
a calendar entry is not equal to zero, A calendar entry matches in case the moment of time is 
within the timeframe and the extended timeframe, respectively. 

In an operation S330, for each matching calendar entry the following operations S335 to S345 
are performed. 
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Id an operation S335, a membership grade is determined from the membership function assigned 
to the timeframe of the current selected matching calendar entry in accordance with the moment 
of time being determined as being associated to the multimedia information. The membership 
grade is employed as a reUabiUty value which indicates to which extent the currently selected 
calendar entry relates in any kind to the multimedia information. 

In an operation S340, the multimedia information is labeled and indexed, respectively. The 
labeling / indexing is based on information being available from the current selected calendar 
entry, i.e. the whole calendar entry itself or the whole information contained in the calendar 
entry, selected information extracted from the calendar entry and the like, wherein the extracted 
information may be used in case that the calendar entry includes semantically structured 
information (fields). 

In an operation S345, it is determined whether all matching calendar entries have been processed. 
In case there exists one or more remaining calendar entries which have not be processed up to 
no\v, the operational sequence returns to operation S330, whereas otherwise the operational 
sequence is continued with an operation S3 50. 

In an operation S350, the method for labeling a multimedia information is competed. 

In an operation S355, the total number of calendar entries available is searched to find matching 
calendar entries. That is, the obtained period of time is compared with the timefirames and the 
extended timeframes of the calendar entries, respectively, where an extended timeframe is 
defined by the period of time in addition with an overhead at the beginning and ending of the 
timefiame in accordance with the corre^onding used membership function. A calendar entry 
matches in case the period of time overlaps witii timefiame and the extended timeframe, the total 
number of calendar entries available is searched to find matching calendar entries. That is, the 
obtained moment of time is compared with the timefirames and the extended tuneframes of the 
calendar entries, respectively, where an extended timeframe is defined by the period of time in 
which the used membership function assigned to a timeframe of a calendar entry is not equal to 
zero. A caleaidar entry matches in case the moment of time is within the timeframe and the 
extended timefirame, respectively, either completely or at least partially. 

In an operation S360, for each matching calendar entry the following operations S365 to S395 
are performed. 
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In an operation S365, it is detennined whether the period of time being identified to be 
associated with the multimedia information matches partially with the timeframe and the 
extended timeframe of the current selected calendar entry, respectively. In case the period of time 
and the (extended) timeframe partially overlap the operational sequence is continued with 
operation S367. In case the period of time and the (extended) timeframe completely overly and 
the period of time is within the (extended) timeframe, respectively, the operational sequence is 
continued with operation S370. 

In an operation S367, the multimedia information is divided mto one or more sections of 
multimedia information having corresponding sectionalized periods of time. One of the section 
with a corresponding associated sectionaUzed period of time matches with the timeframe and the 
extended timeframe of the current selected calendar entry, respectively. The remaining sections 
having corresponding associated sectionahzed periods of time do neither match nor overlap with 
the into the timeframe and the extended timeframe of the current selected calendar entiy, 
respectively. 

In an operation S370, the multimedia information is partitioned into one or more parts of 
multimedia information having corresponding partitioned periods of time. If possible the 
partitioned periods of time may have substantially the same period value, e.g. 5 minutes. The 
value of the partitioned period of time of the last part obtained &om the multimedia information 
by partitioning may differ from the desired period value. The period value of tiie partitioned 
periods of time may be pre-defined and/or user-defined. 

In an operation S375, for each part the following operations S365 to S395 are performed. 

In an operation S380, a membership grade is determined from the membership Amotion assigned 
to the timefiame of the current selected matching calendar entry in accordance with tiie 
partitioned period of time. The membership grade is employed as a reUability value which 
indicates to which extent the currently selected calendar entry relates in any kind to that current 
selected part of the multimedia information. 

In an operation S385, the multimedia information is labeled and indexed, respectively. The 
labeling / indexing is based on information being available from the current selected calendar 
entry, i.e. the whole calendar entry itself or the whole information contained in the calendar 
entry, selected information extracted from the calendar entry and flie like, wherein tiie extracted 
information may be used in case that the calendar entry includes semantically stiixctured 
information (fields). 
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Jn accordance with the partitioning of the multimedia information such a partitioned multimedia 
infonnation is labeled and indexed section-wise for a certain period of time, increasing the value 
of the assigned membership grade as a quantity for evaluatmg the retrieval reliability. The 
partitioning allows to establish a fine reliability value grid appUed to the multimedia infonnation. 

In an operation S390, it is determined whether all parts of the partitianed calendar entries have 
been processed. In case there exists one or more remaining parts which have not be processed up 
to now, the operational sequence returns to operation S375, whereas otherwise the operational 
sequence is continued with an operation S39S. 

In an operation S395, it is determined whether all matching calendar entries have been processed. 
In case there exists one ore more remaining calendar entries which have not be processed up to 
now, the operational sequence returns to operation S360, whereas otiierwise the operational 
sequ^ce is continued with an operation S350. 

In an operation S350, the method for labehng a multimedia information is completed. 

A following retrieval of information is preferably based on the typed labeling / indexing 
information being determined and associated to the multimedia information, however, fi»e text 
search is possible as well. 

Fig. 3d illustrates example membership function plots corresponding to the calendar entries 
depicted in Fig. 3a and several timestamps of different individual multimedia data according to 
an anbodhnent of the inveaition. Fig. 3e shows a depiction of context information of a selection 
of individual multimedia data including indexing infonnation in accordance with the calendar 
entries depicted in Fig. 3a and obtained in accordance with an ranbodiment of the present 
invention. The Fig. 3d and Fig. 3e relates to tiie same individual multimedia data and are 
described below. 

The plot diagram referred to in Fig. 3d contains membership functions for the example calendar 
entry #i and the example calendar entry #j which are both extended by certain different overheads 
in time. As aforementioned the period of time defined in calendar entry #i extends fi:om.6:00 pm 
to 7:00 pm, whereas the corresponding associated extended timeframe extends from 5:58 pm to 
7:06 pm. Analogously, the period of time defined in calendar entry #) extends from 7:10 pm to 
7:25 pm, whereas the corresponding associated extended timeframe extends &om 7:04 pm to 
7:28 pm. Both the membership function associated to calendar entry #i and the membership 
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function associated to calendar entry #j are sectionalized linear membership functions. The 
membership function #i rises from 5:58 pm to 6:02 pm (from membership grade zero to 
membership grade one) and drops from 6:46 pm to 7:06 pm (from membership grade one to 
membership grade zero), respectively, whereas the period of time between is constant 
(membership grade one). The membership function #i rises from 7:04 pm to 7:08 pm (from 
membership grade zero to membership grade one) and drops from 7:20 pm to 7:28 pm (from 
membership grade one to membership grade zero), respectively, whereas the period of time 
between is constant (membership grade one). The membership function #i and membership 
function #j differs in their shape concerning the dropping period. In this example, the sh^e of 
the membership functions may depend on the length of the timeframe. 

Additionally, the periods of time relating to the periods of time of recording an audio multimedia 
data #a, an audio multimedia data #b and an audio multimedia data #c are schematically depicted 
above the membership functions as gray colored horizontal bars. Further, the moments of time 
relating to the moments of time of taking an image multimedia data #a and an image multimedia 
data #b is schematically depicted above the membership functions as gray colored dots. The 
respective exact periods of time of the audio multimedia data #a, the audio multimedia data #b 
and the audio multimedia data #c as well as the moments of time of the image multimedia data 
#a and the image multimedia data #b can be seen in the respective context information iUustrated 
in Fig. 3e. 

The audio multimedia data #a is recorded withiii the period of time extending from 6:14 pm to 
6:22 pm. In accordance with the above described method for generating labeling information the 
period of time of the audio multimedia data #a matches with tiie (extended) timeframe of 
calendar entiy #i and the audio multimedia data #a is partitioned into two parts in accordance 
with a partitioning time value of 6 minutes, the first part extending from 6:14 pm to 6:20 pm 
whereas the second part extending from 6:20 pm to 6:22 pm . The first as well as the second part 
are within the extended timeframe of calendar entry #i such that both parts of the audio 
multimedia data #a are labeled / indexed with the calendar entry #i (depicted herein) or 
information obtained from the calendar entry #i (not depicted herein), respectively. The 
membership grades of both parts are determined as /t = 1.0. The parts (indicated by indices 1 and 
2) are illustrated additionally in the plot diagram shown in Fig. 3d as black colored horizontal 
bars in accordance witii their respective membership grades and in correspondence with tiieir 
partitioned periods of time. 



The audio multimedia data #b is recorded within the period of time extending from 6:46 pm to 
7:10 pm. In accordance witii tiie above described method for generating labeling information the 
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period of time of the audio multimedia data #b matches with the (extended) timeframe of 
calendar entry #i and with the (extended) timeframe of calendar entry #j. Correspondingly, 
labehng / indexing is performed with respect to both the calendar entry #i as well as the calendar 
entry #j. 

In a first section the labeling / indexing in reference to calendar entry #i will be described. The 
period of time of the audio multimedia data #b matches with but do not fit to the (extended) 
timeframe of calendar entry #i such that audio multimedia data #b is sectioned and the fitting 
audio multimedia data #b section extending fix)m 6:46 pm to7:06 pm will be further processed. 
The fitting audio multimedia data #b section is partitioned into four parts in accordance with a 
partitioning time value of 6 minutes, the first part extending from 6:46 pm to 6:52 pm, the 
second part extending from 6:52 pm to 6:58 pm, the third part extending from 6:58 pm to 
7:04 pm whareas the fourth part extending from 7:04 pm to 7:06 pm. 

The first to fourth parts are within the extended timeframe of calendar entry #i such that these 
parts of the audio multimedia data #b section are labeled / indexed with the calendar entry #i 
(depicted herein) or information obtained from tiie calendar entiy #i (not depicted herein), 
respectively. The membership grade of first part is determined as ji4=0.85 which represents an 
average membership grade obtained over the period of time relating to the first part in 
conjunction witii the sh^e of the membership function in this period of time. Analogously, the 
membership grade of second part is determined as /* =0.55, the membership grade of third part is 
determined as jii = 0.25 and the membarship grade of fourth part is determined as =0.05, 
respectively. The parts (indicated by indices 1, 2, 3 and 4) are illustrated additionally in the plot 
diagram shown in Fig. 3d as black colored horizontal bars in accordance with their respective 
membership grades and in correspondence with flieir partitioned periods of time. 

In a second section the labeling / indexing m reference to calendar entiy #j will be described. The 
period of time of the audio multimedia data #b matches with but do not fit to the (extended) 
timeframe of calendar entry #j such that audio multimedia data #b is sectioned and the fitting 
audio multimedia data #b section extending from 7:04 pm to7:10 pm wUl be fiirther processed. 
The fitting audio multimedia data #b section is not partitioned any more since the period of time 
of remaining section corresponds to the partitioning time value. 

The section/part is within the extended timefirame of calendar entiy #j such that this section/part 
of the audio multimedia data #b is labeled / indexed with the calendar entiy #j (depicted herein) 
or information obtained from the calendar entry #j (not depicted herein), respectively. The 
membership grade of section/part is determined as /* = 0.67 The part (indicated by index 5) is 
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illustrated additionally in the plot diagram shown in Fig. 3d as black colored horizontal bars in 
accordance with its respective membership grade and in correspondence wifli its period of time. 

The membership grade within a certain period of time may be obtained from the membership 
function in different way. The above presented way relates to an averagmg process of 
membership grade values within that certain period of time. Alternatively, tiie membership grade 
may be obtained by extracting a maximum or a minimal membership grade value existing within 
that certain period of time, respectively. Further mathematical method for obtaining sensitive 
membership grades are possible and applicable. 

The audio multimedia data #c is recorded within the period of time extending from 7:12 pm to 
7:28 pm. In accordance with the above described method for generating labeling information the 
period of time of the audio multimedia data #c matches with the (extended) timeframe of 
calendar entry #j and the audio multimedia data #c is partitioned into three parts in accordance 
with the partitioning time value, the first part extending from 7:12 pm to 7:18 pm, the second 
part extending from 7:18 pm to 7:24 pm whereas the third part extending from 7:24 pm to 
7:28 pm . The first, second as well as the third part are within the extended timeframe of calendar 
entry #j such that the three parts of the audio multimedia data #c are labeled / indexed with the 
calendar entry #i (depicted herein) or information obtained from the calendar entry #j (not 
depicted herein), respectively. The membership grade of the first part is determined as = 1.0, 
the membership grade of the second part is determined as pt = 0.92 and the membership grade of 
the third part is determined as fL = 0.25, respectively. The parts (indicated by indices 1, 2 and 3) 
are illustrated additionally in the plot diagram shown in Fig. 3d as black colored horizontal bars 
in accordance with their respective membership grades and in correspondence with their 
partitioned periods of time. 

The image multimedia data #a and image multimedia data #b are taken at 6:42 pm and 1 8:43 pm 
where both moments of tinie are within the (extended) tuneframe of calendar entry #i. In 
accordance with the above described method for generating labeling information the image 
multimedia data #a and image multimedia data #b are labeled / indexed with the calendar entry #i 
(depicted herein) or information obtained from the calendar entry #i (not depicted herein), 
respectively. The membership grades of the image multimedia data #a as well as the image 
multimedia data #b are determined as /x== 1.0. The image multimedia data #a and the image 
multimedia data #b are illustrated additionally m the plot diagram shown in Fig. 3d as black 
colored dots in accordance with their respective membership grades and in correspondence with 
their moments of time. 
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The above described method as been embodied to attribute multimedia information with calendar 
entry information being based on a timestamp comparison operation. Further context and sensor 
information may be included in the labeling operation of multimedia information. One possible 
additional information of interest for attributing multimedia information is a location information 
in form of geographical position information or in form of mapped location information. Existing 
devices, like GPS devices or mobile communication terminal device implementing positioning 
services, provide for data on location information of the devices. Based on the timestamps both 
of the location/position information and the multimedia information location/position 
information can be attributed to multimedia information. 

Location/position information as provided for e.g. by GPS components may not be sufficient in 
some cases; in contrast the resolution of positioning information as provided for by a calendar 
entry information is only depending on the indications of the user (street number, floor number, 
room number, etc.). In case both kinds of information, i.e. location/position information such as 
GPS data and location as part of a calendar entry, are available, the one should be selected that 
provides for the better precision. At the same time, the other data can be used in order to vaUdate 
the used location/position information. The location/position information obtain for example 
jfrom a GPS component could even be used in order to validate a complete calendar entry and/or 
to contribute to the membership grade determined in conjunction with the calendar entry. 

The labeling of personal multimedia information by means of personal calendar entries provides 
for a highly reliable approach towards the labeling / indexing and subsequent retrieval of 
multimedia information. In contrast to content analysis method of multimedia information this 
inventive a5)proach provides for a better recall and also for a better precision. Further, this gain in 
retrieval quality is reached with ahnost no additional costs including both any kind of required 
efforts &om the side of flie user and computational costs. 

The presented method for extracting meta-information being used for retrieval of information 
associates a huge amount of meta-information with any user provided information (multimedia 
information). Each user provided information overiapping in time with each calendar entry is 
provided with meta-infoimadon for retrieval provided by the information comprised in the each 
calendar entry. That means for example relating to a name field comprising information about 
persons bemg members of a meeting, user provided information, e.g. recorded dictation, images, 
downloaded information such as documents and the hke, is associated with each name of the 
meeting members such that a retrieval in accordance with one of the name of flie meeting 
members results in a list of retrieval results comprising at least those information parts which 
have been provided by the user of the electronic device during the meeting. 



wo 2004/057892 



43 



PCT/IB2002/005563 



10 



15 



Moreover, the aforementioned method describes in detail to obtain a membership grade value 
which serves as a measure for evaluating a retrieval. Such a measure is useful in view of a huge 
amount of retrieval results for sorting and/or filtering the retrieval results providing a reliable 
processing of the retrieval results. A pre-defined or user-defined membership grade threshold 
may be defined and only retrieval results are presented to a user on a retrieval instruction in case 
the membership grade values defined for each retrieval results exceed the defined membership 
grade threshold which enables to find the intended retrieval result easier since the number of 
retrieval results is limited thorewith, 

AdditionaUy, the membership grade values are suitable for sorting multiple retrieval results for 
example in a value dropping sequence. The user may start a manually selecting of relevant 
retrieval results beginning with those retrieval results which have the highest membership grade 
values. 

m. Associating Contkxt Information to Multimedia Information 



As aforementioned in view of multimedia information being associated with information 
extracted firom an electronic calendar and firom its calendar entries the following embodiment 
20 will address the associating of context information to audio (multimedia) information. Here, the 
context information is to be understood as any information selected and indicated by a user to be 
associated to the audio information. 

More particularly, a most natural human method to express and formulate human ideas and to 
25 exchange views is to express and formulate them linguistically. Common dictation machines take 
account of this natural human behavior but conventionally, dictation machines are limited to a 
kind of serial operation mode only allowing to record speech/voice information. Analogously to 
the above discussed disadvantages of current multimedia information (data), comparably, such 
audio information are also subjected to the problem that information retrieval systems may not 
30 have enough interpretable information about the content of the audio information to allow 
reUable retrieval and provide a useful results thereof Additionally, serial audio recording of 
human ideas and views is indeed a most natural process therefor but it is advantageous in view of 
the expressing of ideas and view as weU as in -regard to ideas and views to associate 
accompanying context information to the audio information, which may be any sensor 
35 information, any information allowing a reUable labeling / indexing of the audio information, any 
information ampUfying or completing the audio information such as images, sketches, video 
sequences and the like. 
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In the following such an enhanced audio dictation apparatus will be presented allowing to 
associate any type of fUrther information to a recorded audio information. 

Fig. 4a illustrates an operational sequence for recording audio information and associating 
additional infonnation of any type to the recorded audio information according to an embodiment 
of the invention. 

The operational sequence for recording audio information may be operated by an adequate 
apparatus or device at least being equipped with an audio sensor such as a microphone for 
sensing audio signals, an audio recording component for recording either analog audio signals or 
a digitized audio information being obtained by an analog-to-digital converter which is suppUed 
with the analog audio signals, a recording storage again either suitable to receive and store analog 
audio signals or digitized audio information, and means for controlling the recording operation 
such as specific keys, a keypad or a keyboard. More particularly, the apparatus or device may 
comprise additionally an audio reproducer such as a speaker for reproducing the recorded audio 
information. 

In case of digitizing the sensed audio signals, the audio recording component may further include 
an audio compressing component for compressing digital audio information of the analog-to- 
digital converter and for decompressing the compressed digital audio information to be suppUed 
to a digital-to-analog converter for being reproduced. The recording storage may be any storage 
device and/or medium capable to store and provide digital information. Conventionally, memory, 
magnetic disk storage, removable storage medium devices are mass storage systems connected 
via a wired or wireless communication network are used for storing digital information. 

In view of associating additional information of any type to a recorded audio information and 
particular multimedia information the apparatus or device implementing audio recording 
capabiUty should further have the capability to generate, retrieve, receive, process or the like the 
additional information. 

In view of the foUowing description of the operational sequence, it shall be assumed that the 
operational sequence is operated by a mobile terminal and more particular by a mobile 
communication terminal having audio recording and reproducing capabiUty and being able to 
handle information such as multimedia information. More precisely, a personal digital assistant 
(PDA) or an enhanced mobile commimication device having any multimedia capability. 
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In the operation S400, the operational sequence for associating additional information of any type 
to an audio recording is started. 

In the operation S410, a user input is awaited for and on a trigger signal caused by a user input 
for example by user operating of any dedicated keys a corresponding key selection signal is 
received for being evaluated. 

In the operation S420, it is checked whether a "start" key has been operated by the user. In case 
the "start" key has been operated the operation sequence continues with an operation S425, 
otherwise the operational sequence continues with an operation S430. 

In the operation S425, the audio recording is started. In accordance to a digital processing 
terminal, the sensed audio signals are converted into an audio (information) stream representing 
the digital audio information. The digital audio information may represent compressed digital 
audio information in accordance resulting from applying any audio compression codec of state of 
the art. The audio stream obtained during the recording of audio signals is continuously stored in 
a digital storage, e.g. in a memory or in a mass storage such as a removable magnetic or optical 
medium storage. In view of the amount of storage edacity required for digital audio information 
and/or with regard to a legal request for permanently storing the digital audio information, a 
storage employing removable magnetic or optical medium may be recommended for storing the 
audio stream. 

The audio signal recording having been started by the user with operating the "start" key may be 
operated continuously until a dedicated stopping or pausing of the audio signal recording is 
instructed by tiie user. 

In the operation S430, it is checked whether a "keyword" key has been operated by the user. In 
case the "keyword" key has been operated tihie operation sequence continues with an operation 
S435, otherwise the operational sequence continues with an operation S440. 

In the operation S435, keywords are recorded. The recording of keywords may be operated in 
parallel to the audio signal recording. That is, the resulting audio stream from the audio signal 
recording may be either split into two different audio (information) streams, one being the 
aforementioned audio (information) stream and the other being an audio keyword (information) 
stream such that it is possible to store both audio streams separately, for example one on a 
removable storage medium being coupled to the device having audio recording capabihty the and 
the other in the memory of the device having audio recording capability. 
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Alternatively, the aforementioned audio (infonnation) stream relating to the contrnuoiis audio 
recording may be paused such that the sensed audio signals are only converted into the audio 
keyword (infonnation) stream for being stored separately from the aforementioned audio 
(infonnation) stream. 

Jn the operation S440. it is checked whether an "attach" key has been operated by the user. In 
case the "attach" key has been operated tiie operation sequraice continues with an operation S445, 
otherwise the operational sequence continues witii an operation S450. 

In the operation S446, the attaching command a user to associate additional information of any 
type to the audio information being currently under recording where the audio signal recording 
may be contmuously operated or may be interrupted. Particularly, the user may obtain any 
additional information of any type to be associated to the recorded audio information. The 
obtaining of that further information may be performed by any procedure provided for by the 
used device. For example, the recording device may have implemented a digital camera which 
allows to take images or video sequences. These taken images and/or video sequences may 
amplifying flie audio infonnation with regard to its intellectual content. Analogously, information 
to be associated to the recorded audio infonnation may be obtained via a wired or wireless 
communication interface from an information servicing entity or may be retrieved from the 
memory of the device or from a storage medium coupled to the device. The possibility to obtain 
additional infonnation to be associated to the recorded audio information are not limited to the 
aforementioned procedures but depend on the capabilities of the device having embedded to the 
audio signal recording capability. Those known in the art may replenish the provided list of 
procedures since the kind of obtaining the additional infonnation is out of the scope of the 
present invention. 

In the operation S450, it is checked whether a "pause" key has been operated by the user. In case 
tiie "pause" key has been operated the operation sequence continues with an operation S455, 
otherwise the operational sequence continues with an operation S460. 

Jn the operation S455, the recording of audio signals is paused. The audio signal recording may 
be resiimed on instruction of the viser. 

In the operation S460, it is checked whether a "continue" ("cont.") key has been operated by the 
user. In case the "continue" key has been operated the operation sequence continues with an 
operation S465, otherwise the operational sequence continues with an operation S470. 
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In the operation S465, the recording of audio signals is continued. The continuing of the audio 
signal recording allows a user to resume the audio signal recording after having instructed to 
pause the audio signal recording (compare with operation S450 and operation S455). 

In the operation S470, it is checked whether a "stop" key has heen operated by the user. In case 
the "stop" key has been operated the operation sequence continues with an operation S475, 
otherwise any further operation may be operated (not shown). 

In the operation S475, the audio signal recording is stop completely. The audio information 
resulting from, the continuous audio stream is complete. 

In the operation S480, context mformation during the audio signal recording are accumulated in 
an associated audio context information. Additionally to the audio information each activity 
having been performed during the audio signal recording by the user has been logged such that a 
complete protocol / description can be generated therefrom. Especially context information 
relating directly to the audio signal recording are included into the protocol, that is, e.g. 
timestamp information relating to starting / stopping of the audio recording, keyword recording 
operation, pausing / resuming operation, linking information relating to address information 
where to retrieve the audio information resulting from the audio (information) stream, the audio 
information resulting from the keyword audio (information) stream and/or additional information 
of any type being instructed by the user to be associated with the audio information or audio 
recording event. But also more common context information can be added to the protocol / 
description like identification of the recording device, of the speaker, of the employed storage 
(e.g. identification of the storage medium), location/position information relating to the place of 
recording and more. 

A possible realization of such a protocol / description will be described with reference to Fig. 4c. 

JtL the operation S490, the method for associating additional information of any type to an audio 
recording is completed. 

The operation of the audio signal recording has been described in view of dedicated keys 
operable with a user for instruction the described operations / commands. Such commands for 
controlUng the audio signal recording may be also generated by operating keys having multiple 
assignments or by a textual or graphical menu offering items corresponding to audio signal 
recording functions for being selected by the user. 
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Fig. 4b shows a sequence of operations in time iUustrating an example usage sequence according 
to an embodiment of the invention. The operations in time may be seen as a dictation of a user 
summarizing the results of a meeting. 

In a first operation the user starts the audio signal recording for dictating the summary for later 
usage or for later remembering. On indication of the user to start audio signal recording (the user 
may select the key "start") the mobile terminal having audio recording capability starts recording 
to receiving audio stream to a removable storage medium (magnetic disk, optical disk, memory 
card etc) coupled to the mobile terminal device via a respective storage medium interface (drive, 
interface etc). A designation of the may be defined manually by tiie user or automatically by the 
recording component to be assigned to the audio information for later retiieval. Herein, tiie 
resulting audio information is provided with the designation "AUDIOl". The starting time 
(timestamp) of the audio recording is stored for example in the memory of the device or in a 
memory assigned to the audio recording component which is responsible for the audio signal 
recording. 

The user dictates text to the mobile terminal which is sensed by an audio soisor, converted into a 
digital audio stream and stored accordingly. 

The user indicates that keywords should be recorded and correspondingly, pushes and holds the 
"keyword" key. On this indication of the user, tiie audio signal recording of one or rnore 
keywords is initiated. As aforementioned, the audio signal recording of keywords is stored 
separately to the audio information dictated before and the audio recording operation initiated 
before is eitiier continued or paused during the recording of the one or more keywords. The 
starting time of the keyword recording is stored. 

Witii releasing of tiie "keyword" key tiie user indicates tiiat the keyword recording has been 
completed and the recording of the dictation is continued. The stopping time of the keyword may 
be stored 

The user instructs subsequently to pause the audio recording by selecting a dedicated "pause" 
key. The recording of the audio signals is paused accordingly. The starting time of the initiated 
pause is stored. Now the user may user other function or otiier applications provided for by the 
mobile terminal. Herein, the user takes an image witii a digital camera coupled in any way to the 
mobile terminal for image data transmission or built in the mobile terminal. The user indicates 
tiiat tiiis image is to be associated with (attached to) tiie recorded audio information for example 
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by operating a dedicated "attach" key. In view of the example relating to a meeting the picture 
may be taken from a sketch board to ampUfy the dictation. The taken image is stored (here in the 
memory or any other available storage area) and a manually or automatically defined designation 
(herein "IMAGEl") is assigned to the image which enable to retrieve the image later. The time of 
attaching of the image to the audio information is stored. 

Subsequently, the user resumes the audio signal recording by operating the "continue" key. The 
resuming time is stored. The user carries on dictating information. 

Finally, the audio signal recording is finished on user instruction or on operation of the "stop" 
key by the user, respectively. The stopping time of the audio recording is stored. 

After the user indication of finishing the audio signal recording the mobile terminal may 
evacuate information stored in a non-permanent storage up to now to a permanent storage if 
available. The protocol / description is generated in accordance with the above described user 
and terminal operations. A corresponding example description which may be obtained is shown 
the following Fig. 4c. 

Assuming that the mobile terminal wdth the audio recording capability supports a file system 
comparable to that known from traditional computer following files may have been generated 
after the stopping of the audio recording: 
AUDIOl, 

KEYW0RDS_AUDI01, 
IMAGEl, and 
DESCRIPTIONl; 

where the file "AUDIOl" contains the digitized audio information of the dictation, the file 
"KEYWORDS_AUDIOr' contains the digitized audio information of tihie dictated keywords, the 
file "IMAGEl" contains the picture taken during the dictation and the file "DESCRIPTION!" 
contains tiie description / protocol to the audio recording. The storage places of the resulting files 
is arbitrary but the files should be accessible for reproducing on user" demands. An example 
coding and organization of the file "DESCRIPTIONl" is depicted in Fig. 4c. 

Fig. 4c shows a possible example description in view of the example usage sequence presented in 
Fig. 4b and according to an embodiment of the invention. The description / protocol of the audio 
recording maybe coded as an extended markup language (XML) coded description / protocol. A 
corresponding document type description (DTD) defines the structure and elements of the XML- 
coded description and ensvues a unique parsing of the description. 



wo 2004/057892 



50 



PCT/IB2002/005563 



The example file "DESCRIPTIONl" shoAvn in Fig. 4c comprises a header section (lines 1 to 3) 
which includes infonnation about the storage medium / storage device where at least the 
recorded audio information "AUDIOl" is stored, information about the recording device and 
information about the time at which the audio recording as be taken place or at which the audio 
information has been finally stored. Further information may be included in the header section 
such as an identification information about the user how has dictated the audio information and 
the like. 

The protocol information is coded in a body section herein extending fi-om line 6 to line 29. Li a 
first subsection of the body a start time and a stop time are defined. The start time is defined to 
be TIMEl and the stop time is defined to be TIMES, hi accordance with the usage operation 
described in detail with reference to Fig. 4b this time information has been stored at starting and 
at stopping of the audio signal recording. Additionally, an address information to the recorded 
audio information is coded. The address information may be coded in accordance to the 
employed data storage system such as a directory/file address infonnation or more commonly a 
uniform resource locator (URL) or uniform resource indicator (URT). Alternatively, the audio 
information may be embedded directly into the subsection. 

A next subsection of the body includes information relating to the keywords. Herein, a moment 
of time relating to the recording of the keywords is coded as TIME2. AdditionaUy, an addressing 
infonnation to the keyword audio information is coded. 

In the next subsection of the body the pause operations is coded. According to tiie above 
presented example a pause has been initiate at the moment of time TIMES and the recording has 
been resumed at the moment of time T1ME4. 

A fiirther subsection of the body includes information about associated infonnation, herein 
designated as embedded objects. The moment of time TIME3 is coded in the subsection at which 
the additional information has been associated to the recorded audio information and address 
information about the information (object) is coded in the subsection in order to aUow access of 
the user to this associated information. 

Following subsections of the body includes fiirther context information. Herein location/position 
information and other infonnation relating to additional sensors are coded such as sensor 
infonnation obtained from and ambient light sensor, acceleration sensor(s), background audio 
signals and background audio signal analysis results and the like. 
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The detailed XML-coded description / protocol with refereace to Fig. 4c is one possibiUty to 
perform a logging of context information during the audio signal recording. Another possibility 
of logging context information is described with respect to the personal logbook mentioned 
5 above. 

Translated into the wording which has been used in conjunction with the detailed description of 
the personal logbook the recorded audio information as well as user commands concerning the 
operation of the audio signal recording may be logged as user input information. Further 
information relating to the associating of information to the recorded dictation may be provided 
to the personal logbook as context information. In this case the labeling and/or indexing of the 
audio information (which represents user input information) results in the same interdependency 
stmcture such as described in view of the following audio information description / protocol 
referred to in Fig. 4c and with regard to the following graphical depiction referred to in Fig. 4d 
and Fig. 4e. 

Speech / voice recognition systems/applications are improving and will gain significance in the 
nearer future since the capability to convert infomiation provided by speech / voice mput into 
text based information is time saving. Besides the pure text based information obtained firom 
20 information provided by speech / voice input the original audio recording will be preserved to 
check the speech recognition (translation) at any time. Such a speech / voice recognition may be 
employed for the recorded audio information where the recognition may be operated in the 
recording device or by a speech recognition service provided by any external entity accessible via 
a communication network, in particular via a mobile communication network in case of mobile 
25 recording terminal. The resulting text based information resulting from a speech / voice 
recognition can be included directly into the description / protocol information of the audio 
recording or address information for retrieving the text based information may be included. The 
speech / voice recognition may be employed to the dictation and/or to the keyword dictation. 

A retrieval system allowing a user to retrieve certain audio information in accordance with 
request information may employ the audio keyword recording by matching request information 
inputted by audio recording with the previously recorded audio keywords being recorded during 
the dictation of the audio information. Moreover, in case of using speech / voice recognition, the 
keywords may be available as text based keywords. In this case, the retrieval may be also based 
on text comparison of text information included the retrieval request. Fxirther information being 
included in the description / protocol of the recording may be employed for retrieval by using 
that information for labeling / indexing the audio recording as aforementioned . 
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The protocol / description information may be employed to create a graphical depiction of the 
logged information for providing a flexible naind-map supporting recognition and retrieval of 
information. The following Fig. 4d and Fig. 4e both illustrate example mind-maps, where the 
mind-map depicted in Fig, 4d relates to a graphical user interface and the mind-map depicted in 
Fig. 4e relates to textual oriented user interface. 

Fig. 4d illustrates graphically a depiction of a protocol / description information according to an 
embodiment of the invention. The graphical depiction may comprise only an extraction of 
information available with the audio information. The graphical depiction shows a first graphical 
r^resentation of a recorded audio information object 402 being denoted as (audio) clip #002. 
Further the graphical depiction of the audio information object 402 includes information about 
the recording date / time and the place of recording. A plurality of further graphical 
representations are arranged aroxmd the depicted audio information object 402. Lines linking the 
graphical representations shall represent interdependencies such as associations. A set of 
graphical representations 403 relates to keywords, keyword object #4, keyword object #5 and 
keyword object #6. The linking lines connecting the graphical representations of each of the 
keyword objects #4, #5 and #6 with the audio information object 402 indicates the belonging of 
those keyword objects #4 to #6 to this audio information object 402. Further, a set of graphical 
representations 404 is depicted and each of the graphical representations is linked by a 
connectmg line to the audio information object 402. Correspondmgly, the linking lines indicate 
that the image object #1, the image object #2 and the text object #1 is associated with the audio 
information object 402 and audio clip #002, respectively. 

Analogously, the graphical depiction shows a second graphical representation of a recorded 
audio information object 400 being denoted as (audio) clip #001 and the audio information 
object 400 has associated keyword objects #1, #2 and #3 shown as a set of graphical 
representations 401 each of which is connected via a line with the audio information object 400. 

The graphical depiction may not only be used for supporting the mind but the graphical depiction 
may be operated as a graphical user interface. For example, the user may select one of the 
graphical representations in order to retrieve the respective indicated information. That is, in case 
the user selects a keyword object the mobile terminal reproduces the recorded digital audio 
sequence. Analogously, in case the user selects audio clip #002 the corresponding dictation is 
reproduced, in case the user selects one of the graphical representations of the image objects an 
image viewer application shows the corresponding image to the user and in case the user selects 
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the graphical representation of the text object a text viewer application shows the corresponding 
text to the user. 

Fig. 4e illustrates a textual oriented depiction of a protocol / description information according to 
an embodiment of the invention. This textual oriented depiction is an alternative depiction 
showing the same properties and information to the user as described with reference to Fig. 4d. 

The logged attributes of the recorded audio information objects audio clip #001 and audio clip 
#002 are presented in a table depiction. The table depiction includes the elements clip 
designation, date and time of recording, place of recording, and address information of the 
associated information, associated text objects, associated keyword objects and associated image 
objects. The tabular depiction allows a clear overview of the information / objects which have 
been associated by user instructions to the respective audio information. 

Analogously, a selecting of an address information entry in the depicted table result in 
reproducing / illustrating of the respective selected information in accordance with the type of the 
information (object). 

TV. Watermarking / EMBEPPmc Information into Audio D ata Sequences 

The aforementioned methods relating to a reliable retrieval and to sensitive retrieval results 
require additional information containing labeling and/or indexing information. That is, the 
single information m reference to which the retrieval, labeling and/or indexing by accompanying 
meta-information is described is of Hmited usefulness. Therefore, it is advantageous to offer a 
method for embeddmg or appending at least a limited amount of meta-uifonnation directly 
into/to information and more particularly directly into audio information. 

The inventive concept for appending and/or embedding information to/into an audio information 
sequence provides several advantages besides the mentioned possibiUty to complete the audio 
information sequence willi labeling and/or indexing information. 

The inventive concept further relates to a method for embedding information of any type into an 
audio sequence which ensures that the embedded mformation is prevented independently from 
converting and transcoding of the audio sequence. That is, the embedded information is 
preserved in the audio sequence although the audio sequence is converted from an analog 
encoded audio sequence to a digital encoded audio sequence and vice versa, from a digital 
encoded audio sequence to an analog encoded audio sequence. Further, the embedded 
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information is preserved in the audio sequence although the audio sequence is transcoded from 
an audio sequence being encoded in a first digital audio format comprising any audio 
compression formats, into an audio sequence being encoded in a second digital audio format. 

5 Besides the possibility to embed information of any type relating to context information, meta- 
information and the like into an audio sequence, the capability to embed information according 
to an embodiment of the invention may be employed to embed authentication and/or 
watermarking information which maybe also encrypted. 

10 The inventive concept to addend and/or to embed information to/into audio information 
sequences is based on an octave-based coding method which allows coding of characters 
representing the appended or embedded information. A well known character encoding is the 
ASCn 0 encoding assigning uniquely a sequence of 7 or 8 bits to one character/symbol resulting 
in code tables assigning uniquely each of the bit sequences with a length of either 7 bits or 8 bits 

15 to one characters/symbols, respectively. Depending on the appended or embedded information to 
be coded the total number of characters/symbols which are required may differ such that related 
coding tables may be employed. In the following, a bit sequence of the length 6 bits. wiU be 
employed to described to inventive method, where this bit sequence aUows to code 64 different 
characters/symbols. Such a code table comprising 64 different elements should be suitable to 

20 code HTML (hypertext markup language) tags, URL (uniform resource locator) sequences, URI 
(uniform resource indicator) sequences and the like, for the coding of which the numbers 0-9, 
characters A-Z (26 letters) and a few specific symbols 0/:\-#.@_ are sufficient. 

Each character/symbol is coded by a unique combination of tones which is repeated over more 
25 than one order (octave). For example relating to the miisical scale comprising the tones 
C,D,EJf,G,A and H such a coding could be e:q>ressed by 

"1"={C,EA}; 
"2"={D,E,H}; 

"3"={C,F3}; 

30 

where three different tones are employed for each tone basis forming a code. As aforementioned 
codes are repeated over all the octaves that are within a sensitivity range of the audio system, i.e. 
the fi^quency range used for audio processing/recording. 

35 Fig. 5a illustrates a fi:equency range of an audio recording / reproducing system including several 
complete octaves. An index n is shows in the firequency range illustrating for enumerating the 
depicted octaves ranging from 1 to 7 corresponding to the seven octaves subsequent in 
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frequency. The three frequencies indicated by the numerals bl, b2 and b3 in the octave n = 6 
relate to a specific code basis of a certain character/symbol employing three different tones. 
Correspondingly to the aforementioned repetition over all octaves, corresponding frequencies in 
each other octave (1, 2, 3, 4, 5 and 7) are indicated which relate to the originally indicated 
5 frequencies b 1 , b2 and b3 . 

In the more general case, the code basis for a given character x is given by a distinct sequence of 
a pre-defined number of tones which can be expressed as {bj(x)} = {bi(x), b2(x), ...}, where the 
index j relates to the number of tones being employed, j = 1, 2, jmax. 

10 

There are numerous ways to create the code basis. In one embodiment according to the invention, 
the code basis is orthogonal i.e. the code bases of the characters/symbols have no correlation. 
According to another embodiment of the invention, the code bases are partly overlapping, i.e. are 
partly correlated, which takes account of the feet that audio signals should be able to withstand 
15 high noise. 

Fig. 5c depicts a signal flow chart illustrating a processing of an audio signal So(t) for embedding 
information into that according to an embodiment of the invention. 

20 The audio signal so(t) 530 is assumed to be the original audio signal having no embedded 
information. A character/symbol x with code {bj(x)} = {bi(x), b2(x), ...} 540 (j = 1, jmax) is to 
be embedded into this audio signal so(t) 530. The combining of both the audio signal so(t) 530 
and the code formulation {bj(x)> 540 is performed in the frequency domain of the signal and the 
code formulation. 

25 

Correspondingly, the audio signal so(t) 530 is transformed into the frequency domain resulting in 
an audio Fourier transformation A(<b) 535 of the audio signal so(t) 530. The code {bj(x)} being 
based on a pre-defined number of tones can be directly denoted in the frequency domain resulting 
in a modulation signal Mo(ca) 545 corresponding to the code {bj(x)} 540 and may be expressed 
30 mathematically as: 

MM = l + 'Z^KS(a>-nbj), 



where the index j relates to the tones (frequencies) employed for the code of the character/symbol 
35 X, tiie index n relates to the frequency sections (octaves) in which the tones (j = 1, jmax) are 
repeated such as depicted in Fig. 5a and K is a pre-defined attenuation factor CK.< 1). The 
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aforementioned given mathematical definition for expressing the modulation signal Mo(to) 545 
represents an ideal mathematical expression. In practice, delta function as presented and 
employed are not available such that the employed delta fimction is to be replaced with a suitable 
approximation function such as a sinc-function, a Gaussian function etc as known to those 
skilled in the art 

These two signals, the audio Fourier transformation A(oi>) 535 as well as the modulation signal 
Mo(ei)) 545, are to be combined m the firequency domain. Accordmg to embodiments of the 
invention the Fourier transformation A(<d) 535 and the modulation signal Mo(a)) 545 may be 
added or may be multipUed in order to obtain a suitable modulation signal M(wl), respectively. 

Fig. 5b shows a set of plot diagram depicting the audio Fourier transformation A(<o) and 
modulation signals M(a)) resulting firom the combining of the Fourier transformation A((d) 535 
and the modulation signal Mo(<o) 545 according to embodiment of the invention. 

The first plot diagram illustrates a curve 500 representing the original audio signal so(t) in the 
firequency domain, i.e. the audio Fourier transformation A((d) 535. Additionally, indications 
above and below the abscissa of the plot diagram being the firequency axis represent both the 
firequency sections (octaves) in which the tones of the code are repeated and the tones of the 
code, respectively. Herein, three frequency sections (octaves) are depicted and the basis code 
comprises a set of two tones {bj(Tc)}={bi(x), b2(x)} which is repeated within each firequency 
section (octave). Analogously to Fig. 5a, the frequency sections (octaves) are depicted by 
indication below the abscissa whereas the tones are depicted by indication above the abscissa, 

A second plot diagram illustrates a curve 510 representing modulation signal M(co) obtained 
from the audio Fourier transformation A(<d) and the modulation signal Mo(g>) by adding in the 
frequency domam; A thkd plot diagram illustrates a curve 520 representing modulation signal 
M((d) obtained &om the audio Fourier transformation A(ci>) and tiie modulation signal Mo((o) by 
multiplymg in the frequency domain. Both plot diagrams further include the original audio 
Fourier transformation A(co) as dotted curves 515 and 525, respectively, in order to indicate the 
difference produced by adding and multiplymg for modidating with the modulation signal Mo((d). 

Both modulation operations, either modulating by adding or modulation by multiplying, are 
applicable for deriving a suitable modulation signal M(o). According to an embodiment of the 
invention, the obtaining of the modulation signal M((o) may be used since as dq)icted deviations 
in the frequency curve caused by the modulation signal Mo(oi) may be analler such that a 
resulting interference in the final audio signal due to these deviations is less obvious to hear by a 
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user. The attenuation factor K is pre-detennined in such a way that code embedded in the audio 
signal is decodable later thereof but the interference is low. 

As described with reference to Fig, 5b and with respect to the embodiment of the invention 
5 referred to in Fig. 5c, the audio Fourier transformation A(a)) 535 obtained by Fourier 
transformation (e.g. performed by a Fourier transformer) of the original signal so(t) 530 and the 
modulation signal Mo(a)) 545 are multiplied 550 (e.g. by a multiplying component, multiplier) 
which results in the modulated signal M(ci)). The multiplying of the audio Fourier transformation 
A(o) and ttie modulation signal Mo(a)) 545 may be operated point-wise. 

10 

Subsequent, the resulting modulated signal M(co) is transformed back into a modulated audio 
signal m(t) in the time domain e.g. by employing inverse Fourier transformation which can be 
operated by an inverse Fourier transformer. In case the modulated signal M(a>) in the frequency 
domain is a discrete signal fast Fourier transformation may be applied for transforming into the 
15 time domain. 

Finally, the modulated audio signal m(t) is combined with the original audio signal so(t). Since 
the above described processing operations appUed to the original audio signal so(t) for obtaining 
the audio signal m(t) the audio signal m(t) and the original audio signal So(t) may be spaced in 

20 time for a certain period -At. A combining of the modulated audio signal m(t) and ttie origmal 
audio signal so(t) without taking account of the time difference -At may cause interference in the 
combined signals. Therefore, the original audio signal so(t) may be delayed resulting in a delayed 
original audio signal so(t) = so(t-At) which is coincide in time with the modulated audio signal 
m(t). The combined final audio signal s(t) 580 is obtained by adding the delayed origmal audio 

25 signal so(t) == so(t-At) and the modulated audio signal m(t) (e.g. operated by an adding component, 
adder). 

In case of a relative slow changing original audio signal s(t) and correspondingly a relatively 
slow changing modulated audio signal m(t) the delay in time -At of the audio signal m(t) due to 
30 the preceding processing operations is small in comparison with the changes ia the audio signals 
such that the compensation of the delay in time-At is not necessary. 

The embedding of a character/symbol x in an audio signal according to an embodiment of the 
invention has been illustrated in detail with reference to Fig. 5c and Fig. 5b. The following 
35 Fig. 5d refers to tiie extracting of an embedded character/symbol from an audio signal according 
to an embodiment of the invention. 
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Fig. 5d shows a block diagram depicting a recovering of information being embedded in an audio 
signal according to an embodiment of the invention. 

The decoding of an unknown character/symbol and an unknown character/symbol sequence 
5 requires the input of an audio signal s(t) embedding the an unknown character/symbol and an 
unknown character/symbol, a correlation basis {Bj} and the code bases {bj(x)}, {bj(y)}, ... of the 
total set of characters/symbols x, y, ... being possibly coded in the audio signal s(t). For 
simplicity but witiiout limitmg thereto the code basis {bj(x)} corresponding to a character/symbol 
X and the code basis {bj(x)} corresponding to a character/symbol y are depicted in Fig. 5d. It is 
10 assumed that the audio signal s(t) has embedded a character/symbol x- 

The audio signal s(t) 600 is transformed into the frequency domain resxilting in a Fourier signal 
Z(<o) 605, for example by Fourier transformation 

15 The correlation basis {Bi} 610 can be deiSned directly in the frequency domain and is represented 
by a set of all frequencies being used to code the characters/symbols. The modulation C(co) 615 
corresponding to the correlation basis {Bj} 610 can be mathematically denoted as followmg: 

fl i 

20 

where the index i relates to frequencies (tones) included in the correlation basis {Bi} 610 and Ihe 
index n relates to the frequency sections (octaves) in which the tones (j = 1, jmax) are repeated 
such as depicted in Fig. 5a. 

25 The modulation C((o) 615 defining all frequencies employed for embedding characters/symbols 
in the audio signal s(t) allows to delete all audio signal information from the audio signal s(t) 
which does not related in any way to coded characters/symbols. The Fourier signal Z(<o) 605 is 
convoluted with the modulation C(a)) 615 resisting in a convolution signal ¥(©) representing 
frequency magnitude signals of the audio Fourier signal Z(co) 605 at the frequencies being 

30 comprised in the modulation C(<jo) 615 and correlation basis {Bi} 610, respectively. The 
convolution may be operated by a convoluting component. The frequency magnitude signals of 
the audio Fourier signal Z(a)) 605 at frequencies being not comprised in the modulation C(q)) 615 
and correlation basis {Bi} 610, respectively, are at least approximately zero. 

35 In a following operation the overall convolution signal Y(a)) is evaluated at the frequencies 
which are associated with a certain code basis. That means, the convolution signal Y(cd) is 
evaluated at the frequencies defined by the code basis {bj(x)} 625, at the frequencies defined by 
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the code basis {bj(y)} 630 and so on. The evaluation results in magnitude values M(x,n) 635, 
M(y,n) 640, ... each representing an evaluation value corresponding to the magnitudes of the 
frequencies which are assigned to ihs code basis {bj(x)}, to the code basis {bj(y)}, ... 

The magnitude values M(x,n), M(y,n), ... may be obtained by adding the frequency magnitudes at 
frequencies defined by the each code basis. The following expression defines mathematically a 
magnitude value for the example of the character/symbol x having the code basis {bj(x)} : 

Mix,n) = '£Y{nbj{x)), 
J 



where index j relates to frequencies (tones) included in the code basis {bj(x)}. The calculation of 
the magnitude values M(x,n), M(y,n), ... may be performed by one summarizing component or 
summarizer receiving the convolution signal Y(co) and subsequently the firequencies of the code 
bases bj(x)}, bj(y)}, ... to provide correspondingly ttie magnitude values M(x,n), M(yja), ... in 
sequence or may be performed by a plurality of summarizing components or summarizers each 
the convolution signal Y(cd) and one of the sets of frequencies corresponding to one of the code 
bases bj(x)}, bj(y)>, ... to provide correspondingly the magnitude values M(x4i), M(y,n), ... in 
parallel. 

A final evaluation of the magnitude values M(x,n), M(y^), ... is obtained by comparing each f 
with another. The comparisons of each of the magnitude values M(x,n), M(y,n), ... with each 
other may be defined as by a quotient of magnitude values of each character/symbol pau: such as 
(x,y). The ratio R(x,y,n) 645 may be expressed mathematically as following: 

where index j relates to frequencies (tones) included in the code basis {bj(x)} and the code basis 
{bj(y)}, respectively. In case of a number of L code bases {bj(Zp)}, where p = 1, L, the number 
of ratios R(Zp.Zq,n), where p,q = 1, L and p 5^ q, is totally equal to L*(L-1). Each ratio R(x,y,n) 
645 may be calculated by a respective divider or dividing component or alternatively, each ratio 
R(x,y,n) 645 may be calculated and provided by one divider or dividing componrait 
subsequently. 



In case of suitable or orthogonal code bases the ratio R(x.x,n) corresponding to the unknown 
character/symbol % is significantly greater as the ratios R(x,y,n), ... which do not correspond to 
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the unknown character/symbol %. Thus, the unknown character/symbol %As extracted and 
detennined from the audio signal s(t). 

In general, the identification of the unknown character/symbol x by evaluation tiie ratios 
5 R(x,y,n), may be uncertain. In such a case a probability evaluation of the ratios R(x,y,n), 
may result m a more reliable and certain result. The evaluation of a kind of maximum probability 
can be employed therefor. 

The frequencies of the code bases may be coded as frequencies having a certain length, in 
10 particular of constant time e.g. approximately 0,1 seconds, wherein the coding of frequencies 
belonging to different code bases are separated by a certain duration of time, in particular of a 
constant time such as approximately 0,05 seconds. The coding of constant-time frequency signals 
and constant-time separation makes it particularly easy to decode the signals. Further, the coded 
information may be completed by error correction information, check ode information, 
15 redundancy information such as known from signal processing technology to enhance the 
robustness of the coded information due to interference. 

In the embodiment shown in Fig. 5d, a mean value <R> 650 of all the ratios R(x,y,n) is derived. 
The mean value <R> allows to normalize the ratios R(x,y,n), ... resulting in a corresponding set 
20 of normalized ratios F(x,y,n), ... The normalization of the ratios R(x,y,n), ... takes into account 
that the frequency magnitudes evaluated incorporates magnitude contributions of the original 
audio signal so(t). The normalization maps (centers) the ratios R(x,y,n), ... around 1 which makes 
the evaluation of the obtained ratios R(x,y,n) or normalized ratios F(x,y,n), ... much more easier. 

25 The mean value <R> 650 may be obtained by a component for calculating a mean value from a 
plurality of individual values. Typically, the arithmetic mean, the geometric mean, the square 
mean are known mean values. Further different means, especially weighted or weighting means 
may be employed for normalizing. 

30 hi the following examples represent embodiments of the present invention which employ the 
aforementioned capability for embedding information directly into an audio sequence. 

Example: prepending/appending tags and/or meta-information to an audio sequence 

35 The appending of context information to an audio sequence is preferable since this information is 
immediately also available with the audio sequence without requiring handling of additional 
information and without ensming the availability of the additional information. 
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The method for embedding information of any type mto an audio sequence according to an 
embodiment of the invention can be also employed to ^pend information at the beginning of the 
audio sequence, in the following designated as prepending, and to append information at the end 
of the audio sequence, in the following designated as appending. In cider to prepend and/or 
^pend information to an audio sequence an audio sequence coding silence is predended and/or 
fended to the original audio sequence. The respective audio sequence coding silence is used to 
embed information as described in combination with the aforementioned method for embedding 
information into an audio sequence which forms audio signals predended and/or appended to the 
audio sequence. . 

A following constructed audio sequence exhibits prepended and/appended information arranged 
to an original audio sequence. The constructed audio sequence may be estabUshed as following: 

BEGIN_TAGtone 

TAG tones for coding prepending information 
END_TAG tone 
BEGIN_CLIP tone 

audio sequence (with or without embedded information/tag) 

END_CLIP tone 

BEGIN_TAGtone 

TAG tones for coding appending information 

ENDJT AG tone 

START_METADATA tone 

METADATA tones for coding meta-information 

END_METAD ATA tone 

In detail, the preprending information is coded by an initial (BEGIN_TAG) tone, which is a 
signal with a certain frequency or with certain multiple frequencies, one or more TAG tones for 
coding information, and a final (END_TAG) tone, which is again a signal with a certain 
frequency or with certain multiple frequencies. The initial one or more TAG tones shall enable to 
identify the audio sequence uniquely, therefore, the initial one or more TAG tones include for 
example time information about the recording of the audio sequence (for example UTC- 
encoded), identification information about the terminal/device which has been ijsed for recording 
(for example an IMEI code of a cellular taminal) and furthrar identification information. But the 
combination of the above presented time and identification information provided for unique 
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identification since one only one audio sequence (voice message/mail) can be recorded by one 
given t^minal/device at one time. 

The one or more TAG tones may be used to coded numbers for 0 to 9, which have a simple 
implementation as musical tones. The TAG for such a cUp for example created on January 15, 
2002, at 9:45:17 UTC on terminal 357149/19/415292/7 can be coded as the following number 
sequence: 2 0 0-2 0- 1 • 1 ^ O ^ ^ ^-l ^-S -S ^- 1 4 9' 1 •9-4- 1 •5-2-9-2-7. 

Analo^usly, the appending information is again coded by an initial (BEGIN_TAG) tone, which 
is a signal with a certain fi-equency or with certain mxiltiple fi:equencies, one or more TAG tones 
for coding information, and a final (END_TAG) tone, which is again a signal with a certain 
firequency or with certain multiple firequencies. 

Moreover, the additional appending information is coded by an initial (START_METADATA) 
tone, which is a signal with a certain firequency or with certain multiple firequencies, one or more 
METADATA tones for coding meta-information, and a final (END_METADATA) tone, which 
is again a signal with a certain firequency or with certain multiple firequencies. The meta- 
information can be used to code information being associated with the original audio sequence, 
such as address information (Unk information, URL information and the Uke), but also content 
information which may be coded itself for example by usuig a markup language such as 
hypertext markup language (HTML), wireless markup language (WML), extended markup 
language (XML) and the like. The method for embedding information into an audio sequence 
allows to code information of any type by providing an arbitrary coding bases. 

The tone system mentioned above can be used to create a basis for e.g. the ASCII table or a 
subset of it, plus a few additional tones shown above. According to an embodiment of the 
invention, the METADATA tones are implemented directly as code bases corresponding to 
ASCn character/symbol table. A code bases allowing to create a reliable 7-or 8-bit coding 
scheme in accordance with the aforementioned method is possible. An alternative coding scheme 
to coding the ASCII character/symbol table refers to the use of the three-digit representation for 
the ASCn character/symbols in question. For example the character "A" is represented by the 
number sequence 065. A part of the corresponding coding table associating to each symbol of the 
ASCn character/symbol table a unique three-digit niraiber sequence is shown in the following 
table: 



+ 


000 


001 


002 


003 


004 


005 


006 


007 


008 


009 


040 


( 


) 


* 


+ 


9 






1 


0 


1 
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050 


2 


3 


4 


5 


6 


7 


8 


9 




> 


060 


< 




> 


? 


@ 


A 


B 


C 


D 


E 


070 


E 


G 


H 


I 


J 


K 


L 


M 


N 


0 


080 


P 


Q 


R 


S 


T 


U 


V 


W 


X 


Y 


090 


Z 


[ 


\ 


A 















where each character/symbol of the table has associated a column number and a row niunber, 
where the respective column number and respective row number are summarized to determine 
tiie respective three-digit mmaiber sequence. 

5 

By using this ASCH coding, the ASCII representation for the html tag "<HEAD>" for example 
would be 060-072-069-065-068-062, giving the tones 0-6-0 0-7-2-0-6-9-0-6-5'0-6-8-0-6-2. This 
coding creates many redundant O's, and some other coding scheme (for example omitting the 
lowercase letters from the ASCII table) would enable just two digits to be used. However, this 
10 requires standardization. The advantage of the pure ASCII table representation is that it is global 
and well-standardized. 

Corresponding code bases according to an embodiment of the invention to encode the number 0 
to 9 required for embedding a three-digit encoded ACSn character/symbol table into an audio 
15 sequence may have the following form each comprising tvsro different music tones being based on 
the known description for representing music tones C, D, E, F, G, A and H: 



"0": 


C 


D 












"1": 


C 






F 








"2": 




D 








A 




"3": 




D 




F 








"4": 






E 


F 








"5": 






E 








H 


"6": 








F 






H 


"7": 


C 








G 






"8": 










G 


A 




"9": 






E 








H 



The presented code bases represent advantageous code bases since all pairs of code bases have at 
20 most one tone in common, and the two tones for each code are mostly separated by at least two 
frequencies which decreases inter-frequency interference. With this choice, it is possible to make 
the codes nearly orthogonal Qiaving a small correlation) and also separated enough from each 
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other that small glitches in the original audio 
affected. 



signal (playback/recording) are not strongly 



The advantageous of the code bases each comprising two music tones can be recognized by 
assuming that the number which is coded is the number comprising Hie music tones "C" and 
"G". Further assuming this number is to be decoded firom an audio sequence. The magnitude 
signal M(x^) may be expressed mathematically by: 

where index i relates to frequencies (tones) included in the code basis {bi(x)}, 8(i,a) is equal to 1 
in case i is one of the indices coding x, and otherwise 0, K the and A(ci>) the contribution of the 
original audio signal into which the number "7" is embedded. The aforementioned mathematical 
expression can be divided into portion-wise defined e3q)ression: 



M{x,n)=- 



Y,A'*b,)+K[A{nbc)+A{nbo)] incase x='T 



^A{nb,)+K[A{nb,)] 



incase ;c="0"a^="C' 



incase ■(x='TW="C"; 



or summarized: 



M(x,«) = 24n^)+^[«if(^.{7Al}M»&c)+<i^(^.{7.8M»*c)] 



As a very rough approximation, it could assumed that over long times, the various A(nbi) will be 
equal, designated as a time mean value <A>, resulting to 

. MjMl « 7 H-i^K;c,{7,0.l})+^(;c,{7.8})] 

Assume that the attenuation valvie K is pre-defined as a relative small value K = 0,2, the 
magnitude (amplitude) of the signal equal to 7,4 in case that the symbol '7' is embedded whereas 
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the magnitude (ampUtude) value of the signal is equal to 7,0 - 7,2 in case one of the other 
symbols (*0' to '6\ •9') is embedded. The system must be capable of distinguishing 
magnitude (amplitude) values from roughly 5% signal diffCTcnces. 

5 Example: meta-information 

In view of the organizing of user provided infomiation wifli meta-information as described in 
detail with reference to the personal logbook, such meta-information may be embedded directly 
into such an audio infomiation. The advantage of a direct embedding is that the meta-infomiation 

10 is always present and retrievable without using any retrieval functionality as described in context 
of the personal logbook. The meta-information can be used to code information being associated 
with the original audio sequence, such as address information (link information, URL 
information and the like), but also content information which may be coded itself for example by 
using a markup language such as hypertext markup language (HTML), wireless markup language 

15 (WML), extended markup language (XML) and the like. The method for embedding infomiation 
into an audio sequence allows to code information of any type by providing an arbitrary coding 
bases. The language encoding system and alphabet is given above allows to code any clear text 
information independent from its meaning and purpose. The alphabet described there can be used 
to embed the letters of any mark-up language around or within the audio clip itself; in particular, 

20 the markup codes could be embedded in the METADATA field described therein. These letters 
can tihen later be retrieved from the audio clip by the aforementioned method and converted to a 
machine-readable format of the markup language. 

Example: watermarking an audio sequence 

25 

Watennarking is a technique being used for example to code information about for example 
ori gin of electronic data and the producer of the electronic data in a manner which ensures that 
the infomiation is present and decodable from the electronic data even in case the electronic data 
have been manipulated. Basically, watermarking was introduced to save copyright information of 
30 digitally provided images. Such a technique is also of interest for being provided for digital audio 
information. 

The simplest watermark is just the aforementioned TAG information of the audio sequence. A 
constant-time tone with a fairly long time, e.g. several tenths of a second or more, may be 
35 employed, since it is then easier to identify the code statistically. More complex redundancy and 
error correction schemes may also be added if so desired. 
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Moreover, it is possible to insert the TAG informatioii and meta-information directly into the 
audio sequence by embedding the information according to an embodiment of the invention. The 
embedded information replaces an appending/prepending of the TAG information which may 
interfere the audio reproduction in systems which do not recognize the prepending/appending 
information as information being coded. 

Example: authentication watermark 

The aforementioned method for embedding information of any type into an audio sequence can 
be used to embed a hidden authentication watermark into the audio sequence. The authentication 
watermark may be relevant in e.g. some legal situations, and allows to verify whether 
modifications of the audio sequences have been made. The authentication watermark reUes on 
known digital signature algorifluns. 

In an embodiment of the invention, the aforementioned prepending information, i.e. the 
prepending TAG, can be used as a basis for authentication watermark. The prepending 
information is encrypted resulting in an encrypted ENCTAG, for example by employing a strong 
asymmetric encryption procedure, such as a public key encryption employing a pubUc and a 
private (secret) key. The encrypted ENCTAG is embedded into the original audio sequence. 

The authentication of the embedded encrypted ENCTAG can be verified by the receiver of the 
corresponding audio sequence, but only the sender (producer) is actual able to decrypt the 
embedded encrypted ENCTAG. 

The aforementioned method for embedding an authentication watermark can be improved by 
implementing obtainiug a kind of checksum &om. the original audio signal to create a signature 
which can not be manipulated or exchanged resulting in a more reliable authentication 
wateraiark. The audio Fourier transform A(co) is partitioned into several discrete segments of for 
exan^le a constant length such as a length of 0.1 seconds and the frequencies are discretized into 
a cCTtain number of frequency sections such as 8 frequency sections. Discretized amplitude 
values A'(g)) are obtained from the audio Fourier transform A((o) and an amplitude mean value 
<A>, i.e. an average ampUtude, of the audio Fourier transform A(co). In case the actual amplitude 
value of the audio Fourier transform A((o) is less than tiie amplitude mean value. <A>, the 
corresponding discretized ampUtude value A'((o) is set equal to zero, whereas in case the actual 
amplitude value of the audio Fourier transform A(oo) is greater than the amplitude mean value 
<A>, the corresponding discretized amplitude value A\<o) is set equal to one. 
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The resulting discretized ampUtude values A'(o)) comprises a sequence consisting of "0" and "1" 
which can be interpreted as a bit sequence. Moreover, this sequence may be partitioned into parts 
of 8 discretized ampUtude values which are interpreted to form a sequence of 8-bit values of the 
total time of the audio sequence. The sequence of 8-bit values can be is treated as 
aforementioned for embedding information into an audio sequence, in particular an ASCH 
coding employing 8 bits. Moreover, the sequence of 8-bit values may be encrypted using any 
known encryption method, such as an asymmetric encryption method like the pubUc key 
encryption. The encrypted sequence is finally embedded into the audio sequence and aUows a 
receiver of the audio sequence to separate the encrypted sequence &om the audio sequence for 
verifying the authentication of the audio sequence. 

The embedding of an encrypted information may be hidden. The hiding of information embedded 
in an audio sequence according to an embodiment of the invention can be achieved by employing 
code bases unique to the user and varying in time. Additionally, normal code bases may be 
employed to embed fiorther information in a transparent (this information is apparent to a receiver 
reproducing the audio sequence) way to the audio sequence. 

Referring to Fig. 5d relating to the extracting of embedded information, it is nearly impossible to 
calculate M(x,n) from an audio sequence having embedded information in case the firequencies 
of the code bases (the correlation basis) used originally for embedding the information are 
unknown. Depending on the coding density (duration of time of each signal of a code basis) a 
small value may be selected for the ampUtude coefficient K. i.e. the attenuation K which results 
in an "invisible" and "inaudible" embedded information, respectively. 



Example: FAT coding 

An improved implementation of an embodiment according to the invention aUows to estabUsh a 
file allocation table analogously to the file allocation table technique known from file handling 
and managing techniques in processmg devices. Such a FAT comprises information about an 
identification sequence, an associated name, a length, a location on the memory medium or the 
storage device (such as a start time in serial media) and the Uke. The aforementioned capabiUty 
of map ASCn coding into a frequency based audio coding aUows to code that information 
correspondingly. ... 

For robustness and security reasons, it may be recommended to re-write the audio FAT 
information embedded in an audio sequence each time the audio FAT information changes and 
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the preserve the previous audio FAT information untU the new audio FAT information coding is 
completed. 

The retrieving of audio FAT information embedded in an audio sequence requires a complete 
5 processing of the audio sequence to extract that information therefrom. Corresponding to the 
available processmg capabiUty employed for retrieving the audio FAT information, the extracting 
process may be time-intensive such that a separate storing (mirroring) of the FAT information 
independently from the audio sequence is advantageous to preserve processing capabiUties and 
processing time. A finalizing operation may ensure that the FAT information being (mirrored) 
10 provided separately and the FAT information being embedding in the audio sequence are 
identical. The finaliT^ng operation may be operated as a synchronization operation, i.e. at a 
certain time interval or at certain event at which the synchronizing of the FAT information is 
useful to prevent any unsynchronized situations. 

15 The audio FAT information may be stored at a fixed position in the audio sequence for example 
pre-defined by a pointer referring to a certain moment of time in the audio sequence. Moreover, 
the duration of time required for storing the audio FAT information may be allocated 
pennanentiy and unambiguously for the audio FAT information. 

20 It will be obvious for those skilled in the art that as the technology advances, the inventive 
concept can be implemented in a broad number of ways. The invention and its embodiments are 
thus not limited to the examples described above but may vary within the scope of the claims. 



